An iPhone Core Audio brain dump

Twitter user blackbirdmobile just wondered aloud when the Core Audio stuff I’ve been writing about is going to come out. I have no idea, as the client has been commissioning a lot of work from a lot of iPhone/Mac writers I know, but has a lengthy review/rewrite process.

Right now, I’ve moved on to writing some beginner stuff for my next book, and will be switching from that to iPhone 3.0 material for the first book later today. And my next article is going to be on OpenAL. My next chance for some CA comes whenever I get time to work on some App Store stuff I’ve got planned.

So, while the material is still a little fresh, I’m going to post a stream-of-consciousness brain-dump of stuff that I learned along the way or found important to know in the course of working on this stuff.

  • It’s hard. Jens Alfke put it thusly:

    “Easy” and “CoreAudio” can’t be used in the same sentence. 😛 CoreAudio is very powerful, very complex, and under-documented. Be prepared for a steep learning curve, APIs with millions of tiny little pieces, and puzzling things out from sample code rather than reading high-level documentation.

  • That said, tweets like this one piss me off. Media is intrinsically hard, and the typical way to make it easy is to throw out functionality, until you’re left with a play method and not much else.

  • And if that’s all you want, please go use the HTML5 <video> and <audio> tags (hey, I do).

  • Media is hard because you’re dealing with issues of hardware I/O, real-time, threading, performance, and a pretty dense body of theory, all at the same time. Webapps are trite by comparison.

  • On the iPhone, Core Audio has three levels of opt-in for playback and recording, given your needs, listed here in increasing order of complexity/difficulty:

    1. AVAudioPlayer – File-based playback of DRM-free audio in Apple-supported codecs. Cocoa classes, called with Obj-C. iPhone 3.0 adds AVAudioRecorder (wasn’t sure if this was NDA, but it’s on the WWDC marketing page).
    2. Audio Queues – C-based API for buffered recording and playback of audio. Since you supply the samples, would work for a net radio player, and for your own formats and/or DRM/encryption schemes (decrypt in memory before handing off to the queue). Inherent latency due to the use of buffers.
    3. Audio Units – Low-level C-based API. Very low latency, as little as 29 milliseconds. Mixing, effects, near-direct access to input and output hardware.
  • Other important Core API’s not directly tied to playback and recording: Audio Session Services (for communicating your app’s audio needs to the system and defining interaction with things like background iPod player, ring/silent switch) as well as getting audio H/W metadata, Audio File Services for reading/writing files, Audio File Stream Services for dealing with audio data in a network stream, Audio Conversion Services for converting between PCM and compressed formats (and vice versa), Extended Audio File Services for combining file and conversion Services (e.g., given PCM, write out to a compressed AAC file).

  • You don’t get AVAudioPlayer or AVAudioRecorder on the Mac because you don’t need them: you already have QuickTime, and the QTKit API.
  • The Audio Queue Services Programming Guide is sufficient to get you started with Audio Queues, though it is unfortunate that its code excerpts are not pulled together into a complete, runnable Xcode project.

  • Lucky for you, I wrote one for the Streaming Audio chapter of the Prags’ iPhone book. Feel free to download the book’s example code. But do so quickly — the Streaming Audio chapter will probably go away in the 3.0 rewrite, as AVAudioRecorder obviates the need for most people to go down to the Audio Queue level. We may find some way to repurpose this content, but I’m not sure what form that will take. Also, I think there’s still a bug in the download where it can record with impunity, but can only play back once.

  • The Audio Unit Programming Guide is required reading for using Audio Units, though you have to filter out the stuff related to writing your own AUs with the C++ API and testing their Mac GUIs.

  • Get comfortable with pointers, the address-of operator (&), and maybe even malloc.

  • You are going to fill out a lot of AudioStreamBasicDescription structures. It drives some people a little batty.

  • Always clear out your ASBDs, like this:

    memset (&myASBD, 0, sizeof (myASBD))

    This zeros out any fields that you haven’t set, which is important if you send an incomplete ASBD to a queue, audio file, or other object to have it filled in.

  • Use the “canonical” format — 16-bit integer PCM — between your audio units. It works, and is far easier than trying to dick around bit-shifting 8.24 fixed point (the other canonical format).

  • Audio Units achieve most of their functionality through setting properties. To set up a software renderer to provide a unit with samples, you don’t call some sort of a setRenderer() method, you set the kAudioUnitProperty_SetRenderCallback property on the unit, providing a AURenderCallbackStruct struct as the property value.

  • Setting a property on an audio unit requires declaring the “scope” that the property applies to. Input scope is audio coming into the AU, output is going out of the unit, and global is for properties that affect the whole unit. So, if you set the stream format property on an AU’s input scope, you’re describing what you will supply to the AU.

  • Audio Units also have “elements”, which may be more usefully thought of as “buses” (at least if you’ve ever used pro audio equipment, or mixing software that borrows its terminology). Think of a mixer unit: it has multiple (perhaps infinitely many) input buses, and one output bus. A splitter unit does the opposite: it takes one input bus and splits it into multiple output buses.

  • Don’t confuse buses with channels (ie, mono, stereo, etc.). Your ASBD describes how many channels you’re working with, and you set the input or output ASBD for a given scope-and-bus pair with the stream description property.

  • Make the RemoteIO unit your friend. This is the AU that talks to both input and output hardware. Its use of buses is atypical and potentially confusing. Enjoy the ASCII art:

                             | i                   o |
    -- BUS 1 -- from mic --> | n    REMOTE I/O     u | -- BUS 1 -- to app -->
                             | p      AUDIO        t |
    -- BUS 0 -- from app --> | u       UNIT        p | -- BUS 0 -- to speaker -->
                             | t                   u |
                             |                     t |

    Ergo, the stream properties for this unit are

    Bus 0 Bus 1
    Input Scope: Set ASBD to indicate what you’re providing for play-out Get ASBD to inspect audio format being received from H/W
    Output Scope: Get ASBD to inspect audio format being sent to H/W Set ASBD to indicate what format you want your units to receive
  • That said, setting up the callbacks for providing samples to or getting them from a unit take global scope, as their purpose is implicit from the property names: kAudioOutputUnitProperty_SetInputCallback and kAudioUnitProperty_SetRenderCallback.

  • Michael Tyson wrote a vital blog on recording with RemoteIO that is required reading if you want to set callbacks directly on RemoteIO.

  • Apple’s aurioTouch example also shows off audio input, but is much harder to read because of its ambition (it shows an oscilliscope-type view of the sampled audio, and optionally performs FFT to find common frequencies), and because it is written with Objective-C++, mixing C, C++, and Objective-C idioms.

  • Don’t screw around in a render callback. I had correct code that didn’t work because it also had NSLogs, which were sufficiently expensive that I missed the real-time thread’s deadlines. When I commented out the NSLog, the audio started playing. If you don’t know what’s going on, set a breakpoint and use the debugger.

  • Apple has a convention of providing a “user data” or “client” object to callbacks. You set this object when you setup the callback, and its parameter type for the callback function is void*, which you’ll have to cast back to whatever type your user data object is. If you’re using Cocoa, you can just use a Cocoa object: in simple code, I’ll have a view controller set the user data object as self, then cast back to MyViewController* on the first line of the callback. That’s OK for audio queues, but the overhead of Obj-C message dispatch is fairly high, so with Audio Units, I’ve started using plain C structs.

  • Always set up your audio session stuff. For recording, you must use kAudioSessionCategory_PlayAndRecord and call AudioSessionSetActive(true) to get the mic turned on for you. You should probably also look at the properties to see if audio input is even available: it’s always available on the iPhone, never on the first-gen touch, and may or may not be on the second-gen touch.

  • If you are doing anything more sophisticated than connecting a single callback to RemoteIO, you may want to use an AUGraph to manage your unit connections, rather than setting up everything with properties.

  • When creating AUs directly, you set up a AudioComponentDescription and use the audio component manager to get the AUs. With an AUGraph, you hand the description to AUGraphAddNode to get back the pointer to an AUNode. You can get the Audio Unit wrapped by this node with AUGraphNodeInfo if you need to set some properties on it.

  • Get used to providing pointers as parameters and having them filled in by function calls:

    AudioUnit remoteIOUnit;
    setupErr = AUGraphNodeInfo(auGraph, remoteIONode, NULL, &remoteIOUnit);

    Notice how the return value is an error code, not the unit you’re looking for, which instead comes back in the fourth parameter. We send the address of the remoteIOUnit local variable, and the function populates it.

  • Also notice the convention for parameter names in Apple’s functions. inSomething is input to the function, outSomething is output, and ioSomething does both. The latter two take pointers, naturally.

  • In an AUGraph, you connect nodes with a simple one-line call:

    setupErr = AUGraphConnectNodeInput(auGraph, mixerNode, 0, remoteIONode, 0);

    This connects the output of the mixer node’s only bus (0) to the input of RemoteIO’s bus 0, which goes through RemoteIO and out to hardware.

  • AUGraphs make it really easy to work with the mic input: create a RemoteIO node and connect its bus 1 to some other node.

  • RemoteIO does not have a gain or volume property. The mixer unit has volume properties on all input buses and its output bus (0). Therefore, setting the mixer’s output volume property could be a de facto volume control, if it’s the last thing before RemoteIO. And it’s somewhat more appealing than manually multiplying all your samples by a volume factor.

  • The mixer unit adds amplitudes. So if you have two sources that can hit maximum amplitude, and you mix them, you’re definitely going to clip.

  • If you want to do both input and output, note that you can’t have two RemoteIO nodes in a graph. Once you’ve created one, just make multiple connections with it. The same node will be at the front and end of the graph in your mental model or on your diagram, but it’s OK, because the captured audio comes in on bus 1, and some point, you’ll connect that to a different bus (maybe as you pass through a mixer unit), eventually getting the audio to RemoteIO’s bus 0 input, which will go out to headphones or speakers on bus 0.

I didn’t come up with much (any?) of this myself. It’s all about good references. Here’s what you should add to your bookmarks (or Together, where I throw any Core Audio pages I find useful):

Previous Post

Comments (14)

  1. I’m kind of new at this but I think you have the ‘Get’ quadrants of the RemoteIO description transposed.

    Great post!

  2. Ah! Absolutely correct. Bus 0 + Output Scope should be what’s going to output H/W, and Bus 1 + Input Scope should be what’s coming in from input H/W. I’ve corrected the table appropriately. Thanks for the catch! This stuff is tricky, isn’t it?

  3. Indeed it is! Glad I could help. It wouldn’t be half so bad if there was better documentation on it. Thanks again for the post.

  4. rpstro02

    Thanks for this helpful post. I am curious about the RemoteIO unit. Does the Bus 1 output only output what comes from the mic input? Or does it mix the Bus 1 Mic Input and Bus 0 From App Input? If it’s the latter than I’m not sure how to create multiple connections as you suggest for doing both input and output. Thanks!

  5. rpstro02: To mix input with app audio, use two RemoteIO nodes in the graph. The “upstream” is your mic input. Connect its bus 1 output to a mixer unit’s input (it doesn’t matter which bus), and then connect your app audio via render callback to another input on the mixer unit. Then connect the bus 0 output of the mixer unit to bus 0 input on a second RemoteIO node to send the mix out to hardware.

  6. I have a problem

    I have a beutiful view controller for video using ffmpeg but I can’t seem to get audioQueues to work with it. I get an error whenever I try to create the queue. Can you give me some idea what I might do.

    ret = av_find_stream_info(avfContext);

    if (ret < 0) {

    NSLog(@"Error: Could not find stream info: %d", ret);



    else {

    NSLog(@"Stream info found");


    video_index = -1;

    audio_index = -1;

    int i;

    // for(i = 0; i nb_streams; i++) {

    for (i = 0; i nb_streams && (video_index < 0 || audio_index streams[i]->codec;

    //avfContext->streams[i]->discard = AVDISCARD_ALL;

    switch(enc->codec_type) {


    video_index = i;

    avfContext->streams[i]->discard = AVDISCARD_NONE;



    audio_index = i;

    avfContext->streams[i]->discard = AVDISCARD_NONE;


    avfContext->streams[i]->discard = AVDISCARD_ALL;




    if (video_index >= 0) {

    avfContext->streams[video_index]->discard = AVDISCARD_DEFAULT;


    if (audio_index >= 0) {

    avfContext->streams[audio_index]->discard = AVDISCARD_DEFAULT;


    float aspectRatio = av_q2d(avfContext->streams[video_index]->codec->sample_aspect_ratio);

    if (!aspectRatio) {

    aspectRatio = av_q2d(avfContext->streams[video_index]->sample_aspect_ratio);


    if (!aspectRatio) {

    aspectRatio = 4.0 / 3;


    if ((float)self.bounds.size.height / self.bounds.size.width > aspectRatio) {

    GLfloat blank = (self.bounds.size.height - self.bounds.size.width * aspectRatio) / 2;

    points[0] = self.bounds.size.width;

    points[1] = self.bounds.size.height - blank;

    points[2] = 0;

    points[3] = self.bounds.size.height - blank;

    points[4] = self.bounds.size.width;

    points[5] = blank;

    points[6] = 0;

    points[7] = blank;


    else {

    GLfloat blank = (self.bounds.size.width - (float)self.bounds.size.height / aspectRatio) / 2;

    points[0] = self.bounds.size.width - blank;

    points[1] = self.bounds.size.height;

    points[2] = blank;

    points[3] = self.bounds.size.height;

    points[4] = self.bounds.size.width - blank;

    points[5] = 0;

    points[6] = blank;

    points[7] = 0;


    texturePoints[0] = 0;

    texturePoints[1] = 0;

    texturePoints[2] = 0;

    texturePoints[3] = 1;

    texturePoints[4] = 1;

    texturePoints[5] = 0;

    texturePoints[6] = 1;

    texturePoints[7] = 1;

    enc = avfContext->streams[video_index]->codec;

    AVCodec *codec = avcodec_find_decoder(enc->codec_id);

    if (!codec) {

    NSLog(@"Error: no encoder for this codec %d", enc->codec_id);




    ret = avcodec_open(enc, codec);

    if (ret = 0) {

    AudioStreamBasicDescription audioFormat;

    audioFormat.mFormatID = -1;

    audioFormat.mSampleRate = avfContext->streams[audio_index]->codec->sample_rate;

    audioFormat.mFormatFlags = 0;

    switch (avfContext->streams[audio_index]->codec->codec_id) {

    case CODEC_ID_MP3:

    audioFormat.mFormatID = kAudioFormatMPEGLayer3;


    case CODEC_ID_AAC:

    audioFormat.mFormatID = kAudioFormatMPEG4AAC;

    audioFormat.mFormatFlags = kMPEG4Object_AAC_Main;


    case CODEC_ID_AC3:

    audioFormat.mFormatID = kAudioFormatAC3;





    if (audioFormat.mFormatID != -1) {

    audioFormat.mBytesPerPacket = 0;

    audioFormat.mFramesPerPacket = avfContext->streams[audio_index]->codec->frame_size;

    audioFormat.mBytesPerFrame = 0;

    audioFormat.mChannelsPerFrame = avfContext->streams[audio_index]->codec->channels;

    audioFormat.mBitsPerChannel = 0;

    if (ret = AudioQueueNewOutput(&audioFormat, audioQueueOutputCallback, self, NULL, NULL, 0, &audioQueue)) {

    NSLog(@"Error creating audio output queue: %d", ret);

    avfContext->streams[audio_index]->discard = AVDISCARD_ALL;

    audio_index = -1;


    else {

    for (i = 0; i streams[audio_index]->codec->sample_rate * AUDIO_BUFFER_SECONDS / avfContext->streams[audio_index]->codec->frame_size + 1), (int)(avfContext->streams[audio_index]->codec->bit_rate * AUDIO_BUFFER_SECONDS / 8));

    if (ret = AudioQueueAllocateBufferWithPacketDescriptions(audioQueue, avfContext->streams[audio_index]->codec->bit_rate * AUDIO_BUFFER_SECONDS / 8, avfContext->streams[audio_index]->codec->sample_rate * AUDIO_BUFFER_SECONDS / avfContext->streams[audio_index]->codec->frame_size + 1, audioBuffers + i)) {

    NSLog(@"Error: Could not allocate audio queue buffer: %d", ret);

    avfContext->streams[audio_index]->discard = AVDISCARD_ALL;

    audio_index = -1;

    AudioQueueDispose(audioQueue, YES);





  7. […] was inspired to put this post together by Chris Adamson’s Core Audio Brain Dump which is a great collection of some of the bits of wisdom you need to get your head around to […]

  8. skajam66

    Thanks for this post. I have also reviewed your talk at 360iDev. I have a simple core audio piece of code working now – it’s a simple metronome using just a single Remote IO unit. I put the metronome sounds into the ioData buffer (within the render callback) at the right time and away you go. If you don’t mind, I have a couple of questions for you:

    – There is an AudioTimeStamp provided to the render callback that has the field mSampleTime which is the number of samples that have passed since some time in the past. Each invocation of the render callback increments mSampleTime by inNumberFrames. The question is: when does the render callback get called? Is it after all previously provided samples have been played out (i.e. mSampleTime is the count of already played samples and you have exactly one sample time (22uS) to get the next buffer to RIO) or is the render callback called some time prior to all samples being played and (a) you have more than 22uS to get the next buffer to RIO and (b) mSampleTime is the last sample that will get played at some time in the future (unless you provide more samples before the current last one is played out)? Kind of a long-winded question but I’m trying to find out how real-time the callback is because….

    – What is the quickest/most real-time method of notifying the main gui thread that a tick on the metronome has occurred? On my gui I have sheet music that needs to “play” in time with the metronome. The notes need to highlight as the musical piece is played. If the render callback is invoked after all current samples have been played then my notification to the gui will have a maximum timing error of the latency of the audio buffer (because the tick could actually occur anywhere in the buffer) plus the latency of getting a notification to the gui plus the gui doing something visible. If the callback is called some time prior to all samples being played then I need to develop a technique where I can notify the gui ahead of time that a note needs to be highlighted etc.

    Would appreciate your thoughts. Apologies for the long post which would have been better in an email…

  9. […] you can do a lot of useful stuff, and then at the lowest level there are two types of Audio Unit: Remote I/O (or remoteio) and the Voice Processing Audio Unit […]

  10. […] An iPhone Core Audio brain dump It’s hard. Jens Alfke put it thusly: […]

  11. […] Core Audio Brain Dump Excellent brain-dump from a Core Audio guru on pitfalls and tips on using Core Audio. […]

  12. itakatz

    Hi and thanks for the invaluable tips.
    I am still having trouble understanding how to set the callback for recording, (setting my app for using the audio from the mic):
    1. If I set the kAudioOutputUnitProperty_SetInputCallback property to set my callback (as in Michael Tyson’s post), the ioData pointer is null, when in a breakpoint inside the callback. Only if I use kAudioUnitProperty_SetRenderCallback to set the callback (as in aurioTouch), I get a non-null ioData pointer.
    2. When I use bus 1 (the input bus), the callback never called. Only if I use bus 0 when setting the callback, is is called. why is that?
    3. When inside the callback, I must call the AudioUnitRender method with inBusNumber==1, even though the input to the callback has inBusNumber==0 – if I use bus 0, I just get an empty buffer after the AudioUnitRender is finished. Why is that? I know bus 1 is the bus for the input from the mic, but why the input argument to the callback is 0?

    Thanks again,

  13. itakatz

    I might start to understand (?)… maybe in aurioTouch example they use the same callback for *both* play/record? that’s why ‘renderCallback’ with bus 0 is used when setting the callback, but bus 1 is used in the callback as an input to AudioUnitRender?

  14. […] since my most-popular blogs have always been these brain-dump things — Core Audio, OpenAL, and In-App Purchase — I figured I’d roll back to that old […]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.