A Simple Introduction to Core Audio

I had to learn Core Audio for a project recently, which despite being notoriously difficult, has been great fun. Starting out, I would have killed for a basic example audio player that didn’t have any unnecessary bells and whistles to just get the basics. I ended up creating this project, which specifically does two things:

Loads an audio file entirely into memory:
The audio files for my project were very short, but many could be playing at the same time, which means that loading them from disk could be a bottleneck.
Plays the loaded data with the Audio Unit API:
This is the most low-level way to play audio, thus offers the most control and lowest latency.

I have created a project which does this here:
https://github.com/jamesalvarez/iosCoreAudioPlayer

The code is verystraightforward once you become familiar with the API, but in this post I’ll go over the above tasks, with some extra notes which could be useful. You need to know basic audio processing terms like samples, channels, frames etc, as well as the C language.

AudioStreamBasicDescription

This struct contains information that defines a specific format for an audio data stream. It can get very complicated, but thankfully you can just define the one you want and load the data and play it back using this format. I chose to use interleaved floats, which means the data comes in a single bufferof floats, with the left and right channels alternating – thus the two samples per stereo frame are always next to each other. Non-interleaved means you get separate data buffers for the left and right channels.

    #define CAP_SAMPLE_RATE 44100
    #define CAP_CHANNELS 2
    #define CAP_SAMPLE_SIZE sizeof(Float32)

    AudioStreamBasicDescription const CAPAudioDescription = {
        .mSampleRate = CAP_SAMPLE_RATE,
        .mFormatID = kAudioFormatLinearPCM,
        .mFormatFlags = kAudioFormatFlagIsFloat,
        .mBytesPerPacket = CAP_SAMPLE_SIZE * CAP_CHANNELS,
        .mFramesPerPacket = 1,
        .mBytesPerFrame = CAP_CHANNELS * CAP_SAMPLE_SIZE,
        .mChannelsPerFrame = CAP_CHANNELS,
        .mBitsPerChannel = 8 * CAP_SAMPLE_SIZE, //8 bits per byte
        .mReserved = 0
    };

When using other data formats you can sometimes have more than one frame per packet, but here this is not the case so the values are straightforward to fill out, using the size of Float32.

ExtAudioFile

It takes quite a few lines to load a file with ExtAudioFile, but the result is that you get your data in whatever format you like. In the Github example, I add error checking but here for more clarity I will just call the functions without checking that they were successful. When dealing with large audio files, it is better to use a ring buffer, where you load more data from the file into memory as it is played rather than loading it all at once. For short files it’s ok to load them completely as I do here.

    ExtAudioFileRef audioFile;

    // Open file
    ExtAudioFileOpenURL(url, &audioFile);

    // Get files information
    AudioStreamBasicDescription fileAudioDescription;
    UInt32 size = sizeof(fileAudioDescription);
    ExtAudioFileGetProperty(audioFile,
        kExtAudioFileProperty_FileDataFormat,
        &size,
        &fileAudioDescription);

    // Apply audio format
    ExtAudioFileSetProperty(audioFile,
        kExtAudioFileProperty_ClientDataFormat,
        sizeof(CAPAudioDescription),
        &CAPAudioDescription);

The first command is to load the audio file with a URL from aCFURLRef, which bridges directly from a NSURL*. Next we get the the AudioStreamBasicDescription of the file. We don’t use this for any other purpose than to calculate the length of the file in frames when allocating buffers to load the file into. Next we set our predefined AudioStreamBasicDescription on the file, so now when we request data, it will come in this format.

    // Determine length in frames (in original file's sample rate)
    UInt64 fileLengthInFrames;
    size = sizeof(fileLengthInFrames);
    ExtAudioFileGetProperty(audioFile,
        kExtAudioFileProperty_FileLengthFrames,
        &size,
        &fileLengthInFrames);

    // Calculate the true length in frames, given the original and target sample rates
    fileLengthInFrames = ceil(fileLengthInFrames * (CAPAudioDescription.mSampleRate / fileAudioDescription.mSampleRate));

Here, we get the number of frames of the file (in the original sample rate), and calculate thenumber of frames for the file using our desired sample rate and the original sample rate.

    // Prepare AudioBufferList: Interleaved data uses just one buffer, non-interleaved has two
    int numberOfBuffers = 1;
    int channelsPerBuffer = CAPAudioDescription.mChannelsPerFrame;
    int bytesPerBuffer = CAPAudioDescription.mBytesPerFrame * (int)fileLengthInFrames;

    AudioBufferList *bufferList = malloc(sizeof(AudioBufferList) + (numberOfBuffers-1)*sizeof(AudioBuffer));

    bufferList->mNumberBuffers = numberOfBuffers;
    bufferList->mBuffers[0].mData = calloc(bytesPerBuffer, 1);
    bufferList->mBuffers[0].mDataByteSize = bytesPerBuffer;
    bufferList->mBuffers[0].mNumberChannels = channelsPerBuffer;

Here we create an AudioBufferList, which is used to store the loaded audio. Before doing so we need to know the number of buffers, the number of channels per buffer and the number of bytes per buffer. Since we are using interleaved data, we only need one buffer, which contains two channels. The number of bytes is simply the number of bytes per frame times the file length in frames. The Github example contains more complex code which can handle non-interleaved data – also in this snippet I have excluded checking to see if the malloc and calloc are successful – just for clarity.

    // Create a stack copy of the given audio buffer list and offset mData pointers, with offset in bytes
    char scratchBufferList_bytes[sizeof(AudioBufferList)];
    memcpy(scratchBufferList_bytes, bufferList, sizeof(scratchBufferList_bytes));
    AudioBufferList * scratchBufferList = (AudioBufferList*)scratchBufferList_bytes;
    scratchBufferList->mBuffers[0].mData = (char*)scratchBufferList->mBuffers[0].mData;

Next we create a second AudioBufferList which is a copy of the first. This is used to load the data in piece by piece. After loading in a chunk of data, the pointers on the scratchBufferList are changed to point to the next section of data on the heap, ready to load the next chunk of data. I copied this technique from the excellent TAAE library.

    // Perform read in multiple small chunks (otherwise ExtAudioFileRead crashes when performing sample rate conversion)
    UInt32 readFrames = 0;
    while (readFrames < fileLengthInFrames) {
        UInt32 framesLeftToRead = (UInt32)fileLengthInFrames - readFrames;
        UInt32 framesToRead = (16384 < framesLeftToRead) ? framesLeftToRead : 16384;

        // Set the scratch buffer to point to the correct position on the real buffer
        scratchBufferList->mNumberBuffers = bufferList->mNumberBuffers;
        scratchBufferList->mBuffers[0].mNumberChannels = bufferList->mBuffers[0].mNumberChannels;
        scratchBufferList->mBuffers[0].mData = bufferList->mBuffers[0].mData + (readFrames *
        CAPAudioDescription.mBytesPerFrame);
        scratchBufferList->mBuffers[0].mDataByteSize = framesToRead * CAPAudioDescription.mBytesPerFrame;

        // Perform read
        ExtAudioFileRead(audioFile, &framesToRead, scratchBufferList);

        // Break if no frames were read
        if ( framesToRead == 0 ) break;
        readFrames += framesToRead;
    }

Now we loop whilst reading in frames to the scratch buffer list, which is updated each time to point to the next section of data. There is a maximum number of frames, this seems to be the done thing, I’m not 100% sure why.

    // Clean up
    ExtAudioFileDispose(audioFile);

    // BufferList and readFrames are the audio we loaded
    audioPlayer->bufferList = bufferList;
    audioPlayer->frames = readFrames;

The last thing is to clean up, which just consists of calling ExtAudioFileDispose. I save bufferList and readFrames, to a custom struct which will be used later during the render callback when playing audio.

Audio Unit Output

It takes slightly fewer lines to set up the most basic stream for output using Audio Units. Since we only have one Audio Unit we don’t need to use a graph or anything like that. We simply create an output component, set the stream to the correct format, set the render callback and switch it on.

    // Description for the output AudioComponent
    AudioComponentDescription outputcd = {
        .componentType = kAudioUnitType_Output,
        .componentSubType = kAudioUnitSubType_RemoteIO,
        .componentManufacturer = kAudioUnitManufacturer_Apple,
        .componentFlags = 0,
        .componentFlagsMask = 0
    };

    // Get the output AudioComponent
    AudioComponent comp = AudioComponentFindNext (NULL, &outputcd);

In this first step we create an AudioComponentDescription, which is a struct that describes a particular Audio Unit. In this case, we choose type: kAudioUnitType_Output and sub type: kAudioUnitSubType_RemoteIO to get the AudioUnit which deals in outputting audio to the device. AudioComponentFindNext finds this AudioComponent, so we can begin to use it.

    // Create a new instance of the AudioComponent = the AudioUnit
    // outputUnit is type AudioUnit
    AudioComponentInstanceNew(comp, &outputUnit);

    // Set the stream format
    AudioUnitSetProperty(outputUnit,
        kAudioUnitProperty_StreamFormat,
        kAudioUnitScope_Input,
        0,
        &CAPAudioDescription,
        sizeof(CAPAudioDescription));

In this step, we create a new instance of the AudioComponent, which gives us the AudioUnit itself, and then we set the stream format using the same stream as we set the file we loaded. This makes it easy when it comes to filling the buffers as no conversion is needed.

    // Set the render callback
    AURenderCallbackStruct callBackStruct = {
        .inputProc = CAPRenderProc,
        .inputProcRefCon = player
    };

    AudioUnitSetProperty(outputUnit,
        kAudioUnitProperty_SetRenderCallback,
        kAudioUnitScope_Global,
        0,
        &callBackStruct,
        sizeof(callBackStruct));

Here we create a struct that contains the name of our render callback ‘CAPRenderProc’ and void* pointer to anything we like, that will be passed in each time the callback is called. I created a struct which amongst other things points to the buffer of data that was loaded earlier.

    // Initialize the Audio Unit
    AudioUnitInitialize(outputUnit);

    // Start the Audio Unit (sound begins)
    AudioOutputUnitStart(outputUnit);

Finally we initialize the audio unit and start it. This will begin calling the render callback for new audio data.

    static OSStatus CAPRenderProc(void *inRefCon,
        AudioUnitRenderActionFlags *ioActionFlags,
        const AudioTimeStamp *inTimeStamp,
        UInt32 inBusNumber,
        UInt32 inNumberFrames,
        AudioBufferList * ioData) {

        CAPAudioOutput *audioOutput = (CAPAudioOutput*)inRefCon;
        CAPAudioPlayer *audioPlayer = &audioOutput->player;

        UInt32 currentFrame = audioPlayer->currentFrame;
        UInt32 maxFrames = audioPlayer->frames;

        Float32 *outputData = (Float32*)ioData->mBuffers[0].mData;
        Float32 *inputData = (Float32*)audioPlayer->bufferList->mBuffers[0].mData;

        for (UInt32 frame = 0; frame < inNumberFrames; ++frame) {
            UInt32 outSample = frame * 2;
            UInt32 inSample = currentFrame * 2;
            (outputData)[outSample] = (inputData)[inSample];
            (outputData)[outSample+1] = (inputData)[inSample + 1];
            currentFrame++;
            currentFrame = currentFrame % maxFrames;
        }
        audioPlayer->currentFrame = currentFrame;

        return noErr;
    }

This is the render callback (designed for clarity as performance isn’t really going to be an issue yet!). Here I am just copying interleaved samples from the previously loaded buffer to the output buffer. I have a struct called CAPAudioPlayer which contains the AudioBufferList, the number of frames, and the current frame, which I first extract. Then I set pointers to the two data buffers. Next I loop over the number of frames the callback has requested, copying from the stored AudioBuffer to the output buffer. Finally I updated the currentFrame, so that audio picks up from the correct spot next time the callback occurs.

The project contains more code, especially how to detect errors, dispose of things, and also how to deal with non-interleaved data. This is about the most basic start to using Core Audio, I hope it was useful!