Rss

Brain Dump: v3 Audio Units

Thanks to the power of unemployment freeing up my daily schedule, I was able to put a lot of work into my talk about Media Frameworks and Swift. The first version of this debuted at Forward Swift in March and was limited to 30 minutes. With an hour to fill at CocoaConf Chicago last weekend, I needed a second demo. And the obvious place for it was to stop talking about v3 Audio Units and actually write one.

Audio Units logo

Background info: audio units are self-contained modules that do something with audio. There are several distinct types: generators that produce sound (like by synthesis or playing from a file), effects that take incoming sound and change it in some way, mixers that combine multiple sources, etc. These units are available in any application that supports the audio unit standard, so they’re seen in things like Logic and GarageBand. Prior to El Capitan and iOS 9, audio units were a Mac-only technology: the closest approximation on iOS was to have some other audio unit set up a “render callback”, meaning you’d provide a pointer to your own function, to be called whenever the downstream unit wanted to pull some samples, and you’d put your audio processing code in there.

We covered using audio units in chapters 7 and 8 of the Learning Core Audio book, but didn’t actually cover creating them. We didn’t do that for a number of reasons: the documentation and base C++ class from Apple was outdated and appeared to be broken, making your own AU was Mac-only, we’d already spent two chapters on audio units, and our editor was leaving and we decided to go pencils-down and ship the damn thing. So, wouldn’t you know it, the first review on iBooks basically ripped us for not covering how to create audio units and dismissed the rest of the book as one-star garbage (and in my own defense, that’s an opinion not shared by any of the other reviews on iBooks and Amazon).

But still, it has bugged me for years that I had never actually written an audio unit of my own. So if one good thing comes from my current flirtations with insolvency, it’s that goddammit, I’m finally writing a working audio unit.

So, iBooks reviewer whichdokta, this one’s for you. And in the immortal words of Elvis Costello, I Hope You’re Happy Now

Teh Rulez

A reminder about my “brain-dump” category: this is for posts where I have just figured out something, and am dumping out things I’ve learned before I forget them. It’s loose, unstructured, incomplete, and sometimes wrong. Certainly my first Core Audio brain-dump doesn’t hold a candle to what was eventually in the Learning Core Audio book.

Now, to set the scene: this is from my session “Media Frameworks and Swift: This Is Fine”, which pushes Core Audio, Core Video, and AV Foundation to show some places where Swift isn’t currently an ideal language choice. The first example is an audio reverser, which calls Audio Toolbox from Swift. It works, but is too obviously C-like to be idiomatic Swift, and is too pointlessly fussy about types for experienced C developers.

The second example is the one with the audio unit. I show off an effect unit that provides a ring modulator, which is a dirt-simple bit of math that nevertheless is interesting, if only because you can connect it to microphone input and sound like a Dalek. Better yet, you can use it any app that supports v3 audio units, such as Garage Band on iPad.

Ring Modulator unit selected as part of Garage Band microphone input effects chain

Setup

  • You start with a new app project. Thing is, you don’t actually run this app, because your audio unit will be added later as an app extension. Your app has to do something to be in the store, but it may well just be a trivial demo of your extension. In my sample code, the app itself is literally just a one-scene storyboard that says “run some other app that uses the audio unit”.

  • The real fun begins when you add a new target to the project. Choose the Audio Unit Extension

    Choose template for Audio Unit Extension

  • Next, fill in the options for the template. There are a couple fields here that are worth a few words:

    Options for audio unit app extension target

    • Only four types of audio units are supported: generators, effects, instruments (which generate sound in response to MIDI events), and music effects. There are other types defined in Core Audio (search the docs for kAudioUnitType_…), but it’s not unreasonable to think these are the ones third parties will want to write.

    • “Subtype” and “Manufacturer Code” have placeholder text “must be exactly 4 characters”, but there’s more to it than this. These are used as four-character-codes by the Audio Component Manager, so they need to be alphanumeric ASCII characters; regardless of what I think of my code, I can’t use “snd💩” for my subtype.

      Moreover, since these are stored in an Info.plist, you also can’t use any symbol that will confuse the XML parser. I tried to use “S&FM” (from my corporate name, “Subsequently and Furthermore, Inc.”) as the manufacturer, and it corrupted the Info.plist file and made it un-openable, since XML treats “&” as the beginning of an escape sequence. Basically, I had inadvertently Little Bobby DropTables’ed my extension’s Info.plist. So, really, just limit yourself to English upper- and lower-case, plus numbers.

    • AUComponent.h implies that manufacturer codes are to be registered with Apple to keep them globally unique, but I’m not sure if Apple is currently maintaining such a registry today. IIUC, I believe Apple reserves all-lower-case for itself, and kAudioUnitManufacturer_Apple is `appl`.

  • When you’re done, your app extension will have new source files for your audio unit, and a view controller to visually edit its parameters (if you choose to support this).

    Files for an audio unit app extension

    One of the points I make in my talk is right here: even if you selected Swift as your language in the options pane, your AUAudioUnit subclass is set up with Objective-C .h and .m files. Only the view controller is a .swift file. This is a major talking point in my session: why does the Xcode template steer you away from Swift? I asked why on coreaudio-api and didn’t get an answer.

    I have a hypothesis: by default, v3 Audio Units run in their own process, communicating with the host application via XPC. There’s an inherent latency to this, so on Mac (but not iOS), you are allowed to set properties in both the extension and host app’s Info.plists to allow the audio unit to be loaded directly into the host application. So that gets me wondering if the lack of a stable ABI for Swift is what prevents writing your unit in Swift: if the Swift 4 compiler produces code with different calling conventions, sizes, offsets, etc. than Swift 3 does, would that prohibit loading the unit in to the host app? And is that the reason for this? At any rate, it is what it is, so Obj-C up, kids.

Coding the Audio Unit

OK, now on to writing the audio unit. Keep in mind that my code is embarrassingly simplistic, by design. Apple already has a “best practices” sample code project — search your Xcode documentation for “AudioUnitV3Example” and you can open “AudioUnitV3Example: A Basic AudioUnit Extension and Host Implementation” directly in Xcode. It’s freaking huge, containing iOS and macOS sample code for both a DSP filter and a MIDI instrument, plus an “AUv3Host” application that loads and runs filter and instrument units (we’ll see that app again soon, so go ahead and run it to get it on your simulator or device for later).

Apple’s code does everything right: reused code is in frameworks, an entirely new C++ “kernel” type is created for the rendering (to be explained below), etc. While it’s a model for what production code should look like, it’s overwhelming to try to find your way around it.

For my example, I wanted to see what the bare minimum would be to create an audio unit from first principles, by adding a new app extension target and building up from there. Once that works, you can add complexity, and eventually reach the point where the practices in Apple’s sample code make sense.

OK, so let’s dig into the code with some more bullet points

  • The AUAudioUnit subclass needs to provide a few things at a minimum:

    • A parameterTree property
    • Getters to return inputBusses and outputBusses (returning type AUAudioUnitBusArray)
    • Two lifecycle methods called allocateRenderResourcesAndReturnError and deallocateRenderResources
    • And finally, a getter called internalRenderBlock, which returns a block to do the actual processing of audio samples.
  • Let’s start with the end in mind: internalRenderBlock is a block that will be passed in the essential data like a timestamp, behavior flags, a count of frames to be processed, a block to call to fetch samples to process, a pointer to put its output, etc. Subject to some performance rules I’ll discuss later, the implementation of this block will use the input samples and any of its own parameters’ values to produce output samples, and put them in the provided AudioBufferList. All the lifecycle, parameter, and bus stuff above exists to facilitate the contents of this block.

  • The template sets up your unit with a single parameter identifier. This is used in initWithComponentDescription to create a parameter for the parameterTree. The Audio Unit Extensions session from WWDC 2015 describes the “tree” concept more thoroughly than I want to deal with here. Instead, the quick-and-dirty is that each parameter your unit will support needs to have a unique id, which is more like an index. In fact, I found it more convenient to change the AudioUnitParameterID type to AUParameterAddress for my one parameter, the frequency of the modulator:

    
    const AUParameterAddress frequencyParam = 0;
    

    Then this is used in initWithComponentDescription to populate the parameterTree property:

    
    // Create parameter objects.
    // Create parameter objects.
    AUParameter *param1 = [AUParameterTree
        createParameterWithIdentifier:@"frequency"
                                 name:@"Frequency"
                              address:frequencyParam
                                  min:15
                                  max:40
                                 unit:kAudioUnitParameterUnit_Hertz
                             unitName:nil
                                flags:0
                         valueStrings:nil
                   dependentParameters:nil];
    
    // Initialize the parameter values.
    param1.value = 22;
    
    // Create the parameter tree.
    _parameterTree = [AUParameterTree
        createTreeWithChildren:@[ param1 ]];
    
  • Next, you have to provide the parameterTree with blocks to set and get values for any of these parameters. My implementation just gets/sets the value of an Obj-C instance variable (not a property!) called frequency:

    
    // implementorValueObserver is called when a parameter changes value.
    _parameterTree.implementorValueObserver =
     ^(AUParameter *param, AUValue value) {
        switch (param.address) {
            case frequencyParam:
                frequency = value;
                break;
            default:
                break;
        }
    };
    
    // implementorValueProvider is called when the value needs to be refreshed.
    _parameterTree.implementorValueProvider =
     ^(AUParameter *param) {
        switch (param.address) {
            case frequencyParam:
                return frequency;
            default:
                return (AUValue) 0.0;
        }
    };
    

    This arrangement looks kind of elaborate, but it does simplify setting parameters from your UI code later.

  • Next, input and output buses. The template code comes pre-loaded with #warnings if you don’t implement these getters. Apple’s sample code introduces a BufferedAudioBus type for these. I haven’t figured out the advantage of that; maybe their DSP needs to hold on to buffers for longer than the duration of one render call, so they stash it in a buffer? At any rate, it worked for me to use the base type as-is.

    
    _inputBus = [[AUAudioUnitBus alloc]
                 initWithFormat:defaultFormat error:nil];
    _outputBus = [[AUAudioUnitBus alloc]
                  initWithFormat:defaultFormat error:nil];
    
  • And now the big one: internalRenderBlock. This returns a block in which you do all your audio processing. This is where things get scary:

    WWDC slide in which Audio Unit rendering is described as SCARY

    What’s scary is the severe limitations on what you are allowed to do in this block. Since it is called from a real-time thread — one that will happily bail on your code if it takes too long, leaving output buffers unfilled, which leads to audio dropouts or glitches — you are not allowed to do anything that can block the thread. That rules out not only obvious long-running actions like I/O, but also blocking on a mutual exclusion (mutex), semaphore, or even doing malloc(). In turn, this prohibits any interaction with the Objective-C or Swift runtimes, since their behavior is not deterministic. The default implementation gives you this cheerful hint:

    
    // Capture in locals to avoid Obj-C member lookups.
    // If "self" is captured in render, we're doing
    // it wrong. See sample code.
    

Implementing the Render Block

OK, so what are we supposed to do, if almost everything outside the block is off-limits? What you need to do is to work with pointers to numeric types, enums, structs, or C++ objects composed exclusively of these types. That guarantees you’re not touching self. So inside the getter but before the block itself, I set up some capture locals:


AUValue *frequencyCapture = &frequency;
AudioStreamBasicDescription *asbdCapture = &asbd;
__block UInt64 *totalFramesCapture = &totalFrames;
AudioBufferList *renderABLCapture = &renderABL;

Each of these is an Obj-C instance variable, and by getting it as a pointer, I avoid capturing self (of course, I can’t use properties for these, because that would also necessitate capturing self). This is one thing that’s literally impossible in Swift, which only has properties and no access to the storage behind a stored property. We’re going to have to wait and see how the Swift Ownership Manifesto turns out in order to give us something that’s compatible with audio unit render blocks.

Now we can start coding the block itself.


return ^AUAudioUnitStatus(AudioUnitRenderActionFlags *actionFlags,
                          const AudioTimeStamp *timestamp,
                          AVAudioFrameCount frameCount,
                          NSInteger outputBusNumber,
                          AudioBufferList *outputData,
                          const AURenderEvent *realtimeEventListHead,
                          AURenderPullInputBlock pullInputBlock) {
    // Do event handling and signal processing here.
    ...
}

Inside the block, the first thing we do is use that pullInputBlock parameter to read in the samples we’re going to perform our effect on (we would skip this step if we were a generator)


// pull in samples to filter
pullInputBlock(actionFlags, timestamp, frameCount, 0, renderABLCapture);

Aside: when I was confused and figuring out why I couldn’t get any output, I skipped this step and ignored the source audio, and instead just wrote a sine wave to the output buffers. With so many things to screw up in these APIs, anything you can do to narrow things down helps.

Now we can perform our effect on the buffers inside the block. Probably time I explain the ring modulator, then. The idea with this filter is dirt simple: multiply one function/signal/wave by another. Usually the modulation wave will have a much lower frequency than the source, like in the range of 15-40Hz. In the figure below, the red wave is a sine wave, the green wave is the modulation, and the blue wave is the result (and I apologize to anyone who’s red-green colorblind, but hopefully you can see the waveforms at least):

Ring-modulated sine wave

The frequency parameter is the frequency of the modulation wave, so by getting its sin(), we can get a value from -1 to 1 that we modulate the source signal by (i.e., we multiply the two samples together). A few recipes I’ve seen do an fabs() on the modulator so we only apply its amplitude, not its sign. I don’t know if that matters, but I’ve done it here. Anyways, walk the samples pulled in the previous step, apply the modulation, and write the results to the outputData‘s buffers:


// copy samples from ABL, apply filter, write to outputData
size_t sampleSize = sizeof(Float32);
for (int frame = 0; frame < frameCount; frame++) {
    *totalFramesCapture += 1;
    
    for (int renderBuf = 0;
         renderBuf < renderABLCapture->mNumberBuffers;
         renderBuf++) {
        Float32 *sample =
                 renderABLCapture->mBuffers[renderBuf].mData +
                    (frame * asbdCapture->mBytesPerFrame);
        // apply modulation
        Float32 time = totalFrames / asbdCapture->mSampleRate;
        *sample = *sample *
                  fabs(sinf(M_PI * 2 * time * *frequencyCapture));
        
        memcpy(outputData->mBuffers[renderBuf].mData +
                   (frame * asbdCapture->mBytesPerFrame),
               sample,
               sampleSize);
    }
}

Running the Audio Unit

This is enough for a minimal audio unit. Choose the extension (not the app) in the scheme selector and run. Since the unit is an app extension, Xcode asks us to choose a host app to run the extension with. This is where it’s handy to have already run the Apple sample code once, because AUv3Host is a perfect app for trying out the audio unit.

Run audio unit in container app

The AUv3Host app has a “Play” button to start playing a drum loop, and shows all installed effects units on the left side, with manufacturer in parentheses (with “aapl” expanded to “Apple”). Our app extension shows up with our manufacturer code (“SnFM”).

Ring modulator app extension used with AUv3Host app

This starts applying the effect to the drum loop, at the default frequency. The last thing to do is to let the user change the frequency.

Adding a UI

Finally, we get to write something in Swift, in AudioUnitViewController.swift. One interesting thing to notice in the file set up by the app extension template is that this class implements the AUAudioUnitFactory protocol, and has a property for the audio unit, which is returned by a createAudioUnit() method. This is called by the extension to create the audio unit itself, and directly calls the init method in the Obj-C audio unit class.

Since this is a view controller, as you might expect, it’s where you build a UI for your audio unit’s parameters (the provided MainInterface.storyboard is already wired up to use this view controller class). A proper implementation will have to handle all the various size classes, so it could be challenging to come up with something that works through the entire iPhone-portrait-to-iPad-landscape range. This being a sample, I just added a slider and called it good (that and a background picture of historic sound engineer Delia Darbyshire, largely for the amusement of Janie Clayton, who named her dog after Darbyshire).

Storyboard and connections for audio unit parameter UI

Then connect your UI widgets as outlets and actions as usual. The nice thing is that this is where the song-and-dance with the parameterTree stuff pays off.


@IBAction func handleFrequencySliderValueChanged(_ sender: UISlider) {
    guard let modulatorUnit =
        audioUnit as? RingModulatorAudioUnit,
      let frequencyParameter =
        modulatorUnit.parameterTree?.parameter(withAddress: frequencyParam)
    else { return }
    
    frequencyParameter.setValue(sender.value, originator: nil)
}

In other words, get the audio unit, find the parameter by its address (a constant exported in the audio unit’s .h file), and just set its value from the slider (all audio unit parameters are floats, so there aren’t any type-safety hassles). In the other direction, if a parameter can change on its own and you need to update a UI outlet, the audio unit’s parameterTree is KVO’able, so you can just observe that.

To show the UI, in AUv3Host, just click the view button in the bottom half of the iPad screen. What’s cooler, though, is to run on the device, bring up Garage Band, add the unit as part of a microphone’s effects chain, and then bring up the editing view inside Garage Band:

Custom AU view inside Garage Band

Etc

So, that’s what I was able to figure out by writing an audio unit for iOS from scratch. The trick now would be to build something more substantial, which would probably get me to the point where I’d be better served by adapting Apple’s sample code than writing my own, or at least adopting its frameworks approach, buffered buses, and the C++ rendering kernel type.

The code for this example is on GitHub at github.com/invalidstream/ring-modulator-v3audiounit, and it’s still rough and nasty in a few spots and there are likely a few things in there that aren’t strictly necessary, but I haven’t screwed around enough to know the difference of what I can take out and what I can’t.

And back to the original point of this exercise: beyond just figuring out v3 Audio Units, the point of this is to delve into areas that Swift simply isn’t allowed, because of the nature of the language itself. In my talk, it’s not really about the code here, but instead about what this tells us about Swift and where its evolution needs to go in the next few years so that we can write audio units in Swift. After all, for a language that bills itself as a “systems programming language”, I don’t buy it yet; the limitations seen here would make me think twice before attempting to write, say, a device driver in Swift.

If you want to see this talk — c’mon, you know it’ll be fun — I’ll be giving it again at CocoaConf Next Door, which is taking place adjacent to WWDC in June.

And that’s all I have to say about audio units for now. Time to get back to video, I think.

Comments (2)

  1. Chris!
    Thank You so much for this post!
    It might not be the complete answers to everything but it is great first deep dive into creating them that was desperately needed and long over due for someone to make since Apple introduced everything.
    Thank you so much!

  2. Chris, great stuff as always!

    re: why BufferedAudioBus
    Here’s my understanding.

    AU’s are supposed to, if their DSP algorithm allows, be able to render in place to avoid the expense of an extra buffer copy.

    But, from the AUInternalRenderBlock.outputData doc, “The output bus’s … buffer pointers may be null on entry, in which case the block will render into memory it owns” This would be bad news if you were expecting to render in place. The BufferedOutputBus has a function for detecting the null pointer and swapping it for your own memory.

    AUAudioUnit has the canProcessInPlace property to tell the host what you want to do, but it would be unsafe to rely on the host to respect the property.

Leave a Reply

Your email address will not be published. Required fields are marked *