CocoaConf Portland ’12 and the AudioQueueProcessingTap

CocoaConf Portland was this last weekend, and the conference continues to grow in scope, prominence, and depth with each installment. Visiting the US west coast for the first time, it picked up Brent Simmons (famous for NetNewsWire, Glassboard, and MarsEdit) and Daniel Pasco of Black Pixel as keynoters, plus James Dempsey playing some of his famous WWDC developer-oriented songs, such as the timeless luau of Cocoa memory management, “The Liki Song”.

For my stuff, I kicked off Thursday with a second run of the all-day Core Audio tutorial, which I’ll be doing again at CocoaConf Raleigh. It’s nice to teach these advanced classes, because I keep learning things about Xcode and Obj-C from the attendees as we work through the projects together.

Photo break! I think this is from the Core Bluetooth session:
Core Bluetooth session at CocoaConf Portland '12

On Friday, I did a revised version of Mobile Movies with HTTP Live Streaming which drops the VLC/mediastreamsegmenter demo that doesn’t work and isn’t going to in Mountain Lion, and instead plays up more of the buy-vs-build practical considerations (researching the bandwidth costs of hosting even a modestly popular stream made my stomach drop). I’d hoped to actually stream part of this session via UStream, but we didn’t have time… for Raleigh, I hope to move this to a 90-minute slot so we have time to spin up UStream and stream part of the talk. It’s a pretty good illustration of the inherent latency of HLS; when I was doing my one-off streaming experiments over the Summer, I found that the Flash-based browser stream on my laptop was about 3-5 seconds behind my broadcast, and the version running in the UStream app on the iPad was about 15-20 seconds behind. Given HLS’ use of 10-second segment files rather than a (possibly fragile) always-open socket connection, this is entirely what we’d expect.

And then there’s my new talk, Core Audio in iOS 6. This is both an overview of Core Audio and a deep dive into Audio Units and Audio Queues, built around a couple of demos of the most fun new audio unit in iOS 6: AUNewTimePitch. This unit gives you independent control of of rate and pitch, instead of playing faster and having pitches go up, like playing a vinyl record too fast. With this unit, you can either pitch-shift a source up or down some number of cents (1/100 of a musical semitone), or rate-shift a source faster or slower (from 1/32 speed to 32x speed). The pitch shift works for realtime sources, so the demo of that pitch-shifts the microphone for comic effect. You can’t rate shift a realtime source (i.e., you can’t get data out of the mic faster than it’s being delivered), so my demo for that offered rate shifting of file playback. As always, this stuff is documented only in the headers, AudioUnitParameters.h specifically. But this much is pretty simple.

What’s a little hairier is the nifty new Audio Queue Processing Tap. This feature bears a little explanation: when you’re playing audio out a queue, you enqueue buffers for playback, and the queue consumes the data it’s given, in order. It’s potentially fire-and-forget — in the all-day class, we build a web radio player that gets packets from the network, enqueues them, and disposes the used buffers when the queue is done with them (this simple approach is not how most people do it… see my recent coreaudio-api post for a discussion of single-use buffers versus the tradition of reuse). Audio Queues are nice because you can stuff compressed data (MP3, AAC, etc.) in the one end and let the queue deal with the decoding and playback.

But the latency creates some problems; if you use our book‘s file-playing sample with its three buffers of 0.5 sec each, then any data you put in the queue takes anywhere from 1.0 to 1.5 to play out. With our web radio player, the latency is even more indeterminate. This means you can’t perform some sort of analysis on your data when you send it to the queue, like the checking levels for doing a visualizer, because you’ll be a second or more ahead of the playback. Plus, that implies doing your own up-front conversion and providing the queue with PCM. This works and is super backwards-compatible — I just added it to a client’s app this Fall — but it’s not easy.

So what the AQTap does is give you a callback with decoded PCM right before it’s about to be played out. Better yet, you can actually mess with the data at this stage. The docs — well, the header file of course — reveals three tap processing modes: pre- and post-effects (which is the first time Apple has revealed that the Audio Queue performs some effects of its own), and siphon. The first two allow you to manipulate the data, the latter is read-only.

That said, manipulating the data may be easier said than done. When you create the AudioQueueProcessingTapRef, you’re passed back the data format the queue will be providing your callback. There doesn’t appear to be a way to set this format, so you need to deal with what you get. In practice, it’s the same kind of SInt16 (signed 16-bit int) that we see in most of the older iOS units… but not in the effect units that were introduced in iOS 5, which can only work with floating-point samples. In the AUNewTimePitch demo, I just took the effect unit’s format and set that throughout my AUGraph, but that’s not an option here, since we have to deliver data back to the queue in the form we received it.

A couple years ago, I came up with a recipe for scenarios like this. It’s daft, but it works.

Chris' pull-based convert-effect-reconvert AUGraph

The key is to understand how AudioUnitRender() works. If you pass in an AudioBufferList that has valid data, the audio unit can perform its work in place, on the samples you provide. So we could use the effect unit by itself, but we’d have to feed it floats, which means doing our own conversion, and the Audio Converter API is built of misery. On the other hand, if the mData members of the AudioBufferList we send to AudioUnitRender() are NULL, the audio unit pulls from its upstream connections. So my recipe is to use an AUGenericOutput — an output unit that is not connected to audio I/O and instead is operated manually, and to put the effect unit upstream of that, surrounded by AUConverter units that will convert the int samples to float going into the effect, and from float back to int for the connection into the generic output unit. The other part of this trick is that the data to be rendered needs to move to the front of the graph (so it can be an input to the first AUConverter), so prior to AudioUnitRender, I copy the data’s pointer to a state variable, and NULL it out in the AudioBufferList. The first converter unit pulls on a render callback function I’ve written, whose only purpose is to provide the pointer to the AudioBufferList that I just saved off and NULLed out.

This slight-of-hand is gently described as “f’ing crazy”, but it works nicely. For the demo, I took the tutorial’s web radio player project and added the pitch-shift effect. Here’s a video of it pitch-shifting CBC Radio 3 in the Simulator.

I did discover along the way that NULL‘ing out the mData members of the AudioBufferList provided to AudioUnitRender() seems to not be strictly necessary, so maybe the rule is whether the unit has incoming connections? Or maybe I’ve misstated the rule; the docs talk about who provides the pointers, not where the data comes from. Whatever, it’s a neat trick and it works.

So, that’s my CocoaConf Portland. I’m doing pretty much the same stuff in Raleigh at the end of November, and then we’ll see what makes sense for the CocoaConfs that have been scheduled for next March and April… are there enough potential attendees to keep doing the Core Audio all-day tutorial three or four more times? Please post a comment or ping me on social media (@invalidname (Twitter) / @invalidname ( if you’d like to do a Core Audio class in early 2013.

Download Links:

Comments (3)

  1. Hi,

    Just a quick clarification regarding Audio Queue Taps. You said:

    “When you create the AudioQueueProcessingTapRef, you’re passed back the data format the queue will be providing your callback. There doesn’t appear to be a way to set this format, so you need to deal with what you get. In practice, it’s the same kind of SInt16 (signed 16-bit int) that we see in most of the older iOS units.”

    This isn’t true in all cases. Most of the time you get an S16 PCM interleaved stream, however, depending on the source audio, you may get other PCM formats, such as 8.24 fractional. If you are always expect S16, you code will not process things correctly those times you get 8.24 instead.

    What I found useful to handle this is create an AudioConverter ahead of time and then in the tap callback use the converter to put the audio in a consistent format. In my use case, I need to save a copy of the audio a little longer anyhow, so the AudioConverter is a handy way to do the copy and ensure it is in a consistent format.


  2. Thanks, that’s great to know. Also, when you say you’re using an Audio Converter, do you mean the AUConverter unit (like I have 2 of here), or an actual AudioConverterRef, callback and all?

  3. […] other thing I had to correct was my ambitious pitch-shifting web radio demo that I developed for my CocoaConf Portland talk. The effect stopped working in iOS 6.1, and fixing it led to a long engagement with the […]

Leave a Reply

Your email address will not be published. Required fields are marked *