Rss

Archives for : c

What’s New, Blue Q?

One-time self-described “World’s Greatest Compressionist” Ben Waggoner posts a pointed question to the quicktime-api list:

http://www.apple.com/macosx/what-is-macosx/quicktime.html

What I’d like to know is if QuickTime X is going to be available for Windows and older versions of Mac OS X.

It’s an important issue, because despite iTunes’ insistence on installing QuickTime on Windows, the future of that product seems completely unknown. For years, every question I’ve seen about the future of QuickTime on Windows has been met with absolute silence from Apple. Yeah, I know, “Apple does not comment on unannounced products,” and all… Still, Apple has left this technology in limbo for a remarkably long time. I recall asking ADC reps about QuickTime for Windows back at Leopard Tech Day Atlanta in 2006, as I was considering calling it from Java with JNI, and (as previously noted), I got no reply at all. And every other public question I’ve seen about the future of QuickTime on Windows has gone similarly unanswered, for years.

Smell that? That’s the scent of Abandoned Code Rot. We got that from QuickTime for Java for a few years before they managed to finally deprecate it (though they apparently haven’t gotten the message out).

It wouldn’t be too surprising to see QT for Windows fall by the wayside… Apple probably cares more about the popularity of its favorite formats and codecs (AAC and H.264) than of the QuickTime APIs and QuickTime’s interactive features like Wired Sprites that have been clearly and unequivocally beaten by Flash.

But if that’s true of Windows, is it also true on the Mac? QuickTime developers are right to be a little worried. The old C-based QuickTime API remains a 32-bit only option, intended to be replaced by the Objective-C QTKit. But in the four years since its introduction in Tiger, QTKit has only taken on part of the capabilities of the old QuickTime API. With Leopard, you could finally do capture and some significant editing (e.g., inserting segments at the movie or track levels), but raw sample level data was unavailable for any track type other than video, and some of the more interesting track types (like effects and especially tweens, useful for fading an audio track’s volume between specific times) are effectively useless in QTKit.

With Snow Leopard, the big news isn’t a more capable QTKit API, it’s QuickTime X. And as Apple’s QuickTime X page points out, QTX is all about a highly-optimized playback path (using decode hardware if available) and polished presentation. Great news if you’re playing 1080p movies on your computer or living room PC, not so much if you want to edit them: if you want to edit anything, you’re back in the old 32-bit QuickTime (and the code is probably still written in C against the old APIs, given QTKit’s numerous limitations). You don’t see a 64-bit Final Cut Pro, now do you? (BTW, here’s a nice blog on that topic.)

When you all install Snow Leopard tomorrow and run the QTX-based QuickTime Player, you’ll immediately understand why the $30 QuickTime Pro (which bought you editing and exporting from the Player app and the plug-in) is gone. Follow up in the comments tomorrow (after the NDA drops) and we’ll discuss further.

If I were starting a major new multimedia project that wasn’t solely playback-based — imagine, say, a podcast studio that would combine the editing, exporting, and publishing tasks that you might currently perform with Garage Band, iTunes, and FTP — I would be very confused as to which technology to adopt. QuickTime’s cross-platform story seems to be finished (QTJ deprecated, QTW rotting away), and everything we hear on the Mac side is about playback. Would it be safer to assume that QuickTime doesn’t have a future as a media creation framework, and drop down to the engine level (Core Audio and Core Video)? And if not QuickTime… then what?

Oh, and as for the first question from the quicktime-api thread:

… How about Apple throwing us a bone as to what QuickTime X will offer those of us that use QT and QTSS?

From what I can tell, Apple has all but ditched QTSS in favor of HTTP Live Streaming, supported by QuickTime X and iPhone 3.0.

A shelf tour

After last week’s crunch, closing the last of the errata and our own to-dos, we have officially sent iPhone SDK Development to production, meaning it will now get a second copy-edit, typesetting, printing, binding, and shipping to your anxious hands.

A note of thanks is in order to the beta-program purchasers and the thousands of errata they filed, along with over 500 forum discussions on the book. No getting anything past this group, that’s for sure.

Well, maybe one little thing. With the Snow Leopard release date now set, it looks like Apple engineers have a chance to reply to e-mail from several months ago. Last night, I heard back from the dns-sd.org folks (at an @apple.com address) about my request back in June to reserve a Bonjour service type for the Game Kit example in the book. The bad news is, I included an illegal character in my service name, so I had to change it from amiphd_p2p to amiphd-p2p, which is now part of the public list of DNS SRV (RFC 2782) Service Types. And the only reason that’s bad is that the book still has the name with the underscore, and I’m currently locked out of the book during production.

It’s a minor point, and it will get fixed, it’s just silly-bad timing, getting a reply to a two-month-old e-mail just a day after we wrapped the book.

Another interesting @apple.com e-mail has to do with the Clang Static Analyzer that we cover in the performance chapter, but that remains NDA for now. Anyways, they’ll have their own updates in due course, so watch their page.

Related point: I went to the Barnes & Noble on 28th Street for the first time in ages today, and drifted by the computer book section. It’s probably the biggest in Grand Rapids, for what that’s worth. Computer book sections are shrinking everywhere, particularly the programming sections, for a number of reasons: anything nichey is a non-starter at retail and is basically only available via Amazon and the like, programmers are eagerly jumping into eBooks (or bundles where you get a PDF now and the paper book when it’s ready), some programmers prefer the immediacy of blogs and other informal sources to stuffy books, and of course nearly any computer eBook of any significance is on bitTorrent (including ours, despite the fact that the unauthorized PDFs all clearly identify the reader who chose to post his or her copy). All of which goes to explain why your local retailer has less reason to stock computer books when they can make more money off political screeds and trifling fiction. And, as I discussed a few weeks back, why you’re going to continue to see fewer and fewer programming books going forward.

Still, the iPhone SDK is such a hot topic that even all this friction can’t stop it from being a popular topic with readers and authors alike. There were at least four other iPhone programming books on the shelves, and I took a first peek at several of them today. Note of explanation here: when writing a book, I never look at anything that could be considered a “competing” book. It’s my own mental firewall that ensures that my work is my own, and that I don’t lift ideas from fellow authors. That said, I do read the official docs, both to learn things myself and to make sure that the book I’m writing goes beyond what you can get for free. There’s no value for the reader (or the writer) if the book is just a paraphrase of Apple’s programming guides.

I think the only one on the shelves today that is officially updated for iPhone SDK 3.0 is Dave Mark’s Beginning iPhone 3 Development book, which features significant coverage of Core Data, probably the most significant of the new features in iPhone SDK 3.0. Of the older titles covering iPhone 2.x, I saw Erica Sadun’s, Jonathan Zdziarski’s, Neal Goldstein’s and Christopher Allen and Shannon Appelcline’s books.

They’re probably all worth a deeper read, though a glance through them and a mental comparison to my own project of the last year shows some similarities and differences. I’m sure all of us are grateful for the ease of getting screenshots from the simulator, as all the titles are rich with illustrations. Nearly all of them cover OpenGL, which ours actually doesn’t, I think because Bill thought that readers would be better served by studying OpenGL on its own (and that there isn’t enough unique about its iPhone implementation… as opposed to say, SQLite, which I put in the book not so much for the SQL or even the C API as for the strategies of managing the database files within an iPhone context: creating them with your Mac’s sqlite3 interactive shell, putting them in a bundle for deployment and backup, etc.). On the other hand, I think ours is the only book to talk about debugging and the various performance tools (Shark, Instruments, and the third-party Clang Static Analyzer). Unsurprisingly, given my inclinations, it looks like we hit media a lot harder than our peers. Counting the new-for-3.0 “Music Library Integration” chapter, we ended up with four media chapters, totaling nearly 75 pages. And that’s after cutting the too-hard-for-now Audio Streaming chapter.

It looks like all the other authors assumed a pre-requisite level equivalent to ours: know a curly-brace language, preferably C, and we’ll cover Objective-C as we go. We’ve had a few scripting-language converts (Flash/ActionScript people, it seems) on our forums who have a hill to climb with the latent subtle C-isms, mostly the memory stuff, and I wonder if our colleagues have had similar experiences with their audiences. C knowledge is a strange thing: all us old folks think it’s a lingua franca, yet I think we all know that younger developers no longer learn it as a matter of course, and may not be particularly eager to do so.

Anyways, I imagine everyone else is rushing out their 3.0 updates too, so it’ll be interesting to see what new features get covered, and what our readers still want from us in future versions or more advanced titles.

iPhone Camp Atlanta talk pre-release[?] video

I found a link on blip.tv (Low-Latency Core Audio with Queues, Units, Graphs, and AL) to the talk I gave at iPhone Camp Atlanta a few weeks ago. Here’s the <embed>:

I’m not sure if this is meant as final, or if the conference’s home page will have updated video links later. I sure hope it’s not final, because the video is several minutes out of sync with the audio. To wit, I’m looking at 10:27 right now, where the video shows a dialog indicating the hardware latency of the Audio Unit demo, but the soundtrack is still one topic back, talking about Audio Queues. It also cuts out at least five minutes before the end of the talk.

While it was short and off the cuff, I thought it went well and covered a lot of important stuff for people who’ve touched Core Audio and want to know how far down the rabbit-hole goes. Hoping they’ll post a fixed video someday.

An iPhone OpenAL brain dump

I’ve done something like this before, when I completed parts 1 and 2 of the audio series. I just sent off the first draft of part 3, and I’ve got OpenAL on the brain.

  • Docs on the OpenAL site. Go get the programmer’s guide and spec (both links are PDF).

  • Basics: create a device with alcOpenDevice(NULL); (iPhone has only one AL device, so you don’t bother providing a device name), then create a context with alcCreateContext(alDevice, 0);, and make it current with alcMakeContextCurrent (alContext);.

  • Creating a context implicitly creates a “listener”. You create “sources” and “buffers” yourself. Sources are the things your listener hears, buffers provide data to 0 or more sources.

  • Nearly all AL calls set an error flag, which you collect (and clear) with alGetError(). Do so. I just used a convenience method to collect the error, compare it to AL_NO_ERROR and throw an NSException if not equal.

  • That sample AL code you found to load a file and play it with AL? Does it use loadWAVFile or alutLoadWAVFile()? Too bad; the function is deprecated, and ALUT doesn’t even exist on the iPhone. If you’re loading data from a file, use Audio File Services to load the data into memory (an NSMutableData / CFMutableDataRef might be a good way to do it). You’ll also want to get the kAudioFilePropertyDataFormat property from the audio file, to help you provide the audio format to OpenAL.

  • Generate buffers and sources with alGenBuffers() and alGenSources(), which are generally happier if you send them an array to populate with ids of created buffers/sources.

  • Most of the interesting stuff you do with sources, buffers, and the listener is done by setting properties. The programmer’s guide has cursory lists of valid properties for each. The getter/setter methods have a consistent naming scheme:

    1. al
    2. Get for getters, nothing for setters. Yes, comically, this is the opposite of Cocoa’s getter/setter naming convention.
    3. Buffer, Source, or Listener: the kind of AL object you’re working with
    4. 3 for setters that set 3 values (typically an X/Y/Z position, velocity, etc.), nothing for single-value or vector calls
    5. i for int (technically ALint) properties, f for ALFloats
    6. v (“vector”) if getting/setting multiple values by passing a pointer, nothing if getting/setting only one value. Never have both 3 and v.

    Examples: alSourcei() to set a single int property, alSource3i() to set three ints, alGetFloatv() to get an array of floats (as an ALFloat*).

  • Most simple examples attach a single buffer to a source, by setting the AL_BUFFER property on a source, with the buffer id as the value. This is fine for the simple stuff. But you might outgrow it.

  • 3D sounds must be mono. Place them within the context by setting the AL_POSITION property. Units are arbitrary – they could be millimeters, miles, or something in between. What matters is the source property AL_REFERENCE_DISTANCE, which defines the distance that a sound travels before its volume diminishes by one half. Obviously, for games, you’ll also care about sources’ AL_VELOCITY, AL_DIRECTION, and possibly some of the more esoteric properties, like the sound “cone”.

  • Typical AL code puts samples into a buffer with alBufferData. This copies the data over to AL, so you can free your data pointer once you’re done. This is no big deal for simple examples that only ever load one buffer of data. If you stream (like I did), it’s a lot of unnecessary and expensive memcopying. Eliminate with Apple’s standard extension alBufferDataStatic, which eliminates the copy and makes AL read data from your pointer. Apple talks up this approach a lot, but it’s not obvious how to compile it into your code: they gave me the answer on the coreaudio-api list.

  • To make an AL source play arbitrary data forever (e.g., a radio in a virtual world that plays a net radio station), you use a streaming API. You queue up multiple buffers on a source with alSourceQueueBuffers(), then after the source is started, repeatedly check the source’s AL_PROCESSED property to see if any buffers have been completely played through. If so, retrieve them with alSourceUnqueueBuffers(), which receives a pointer to the IDs of one or more used buffers. Refill with new data (doing this repeatedly is where alBufferDataStatic is going to be your big win) and queue it again on the buffer with alSourceQueueBuffers.

  • On the other hand, all you get back when you dequeue is an ID of the used buffer: you might need to provide yourself with some maps, structures, ivars, or other data to tell you how to refill that (what source you were using it on, what static buffer you were using for that AL buffer, etc.)

  • This isn’t a pull model like Audio Queues or Audio Units. You have to poll for processed buffers. I used an NSTimer. You can use something more difficult if you like.

  • Play/pause/stop with alSourcePlay(), alSourcePause(), alSourceStop(). To make multiple sources play/pause/stop in guaranteed sync, use the v versions of these functions that take an array of source IDs.

  • You’re still an iPhone audio app, so you still have to use the Audio Session API to set a category and register an interruption handler. If you get interrupted, set the current context to NULL, then make a new call to alMakeContextCurrent() if the interruption ends (e.g., the user declines an incoming call). This only works for iPhone OS 3.0; in 2.x, it’s a bag of hurt: you have to tear down and rebuild everything for interruptions.

That’s about all I’ve got for now. Hope you enjoy the article when it comes out. I’ve had fun pushing past the audio basics and into the hard parts.

Fun with varargs

For reasons you don’t need to know about (yet), I wanted to get my usual crutch of a logging UITextview implemented in plain C.

I hadn’t wanted to mess with varargs, so I usually write an Obj-C method like this:


-(void) screenLog:(NSString*) s {
	textView.text = [NSString stringWithFormat:@"%@%@n",
		textView.text, s];
}

What this does is to create an autoreleased NSString built from a format that’s just two strings concatenated together — the current contents of the text view and the argument string — and a new-line character. It then sets this new string as the new text of the UITextView

It sucks a little bit to call, because you have to pass in an NSString, not the usual varargs you’d use with NSLog. So to do:

NSLog (@"Current age: %d", 41);

you’d have to build the string up-front, like this:

[self screenLog: [NSString stringWithFormat: @"Current age: %d", 41]];

So, kind of annoying, but still useful when you want to log to the screen instead of standard out, like I’ve had to do this week while doing some Bonjour stuff between multiple devices scattered about the office, at most one of which gets to log to Xcode’s console. Yesterday’s post, with onscreen output of the two devices getting each other’s test message, shows why this is a nice crutch to have for experiments, prototypes, and throwaways.

Anyways, I actually wanted to do this with plain ol’ C, and happened across Matt Gallagher’s great write-up of varargs in Cocoa. Combining that with the realization that NSString has some method signatures that take a va_list, I was able to rewrite my screen logger in plain ol’ C:

void LogToUITextView (UITextView *view, NSString* format, ...) {
	va_list args;
	va_start (args, format);
	NSString* appendedText = [[NSString alloc]
				initWithFormat: format arguments: args];
	view.text = [NSString stringWithFormat:
				 @"%@%@n", view.text, appendedText];
	[appendedText release];
}

Calling it feels a lot more like calling NSLog:

- (void)viewDidLoad {
    [super viewDidLoad];

	// customize point
	LogToUITextView(textView, @"Current age: %d", 41);
	LogToUITextView(textView, @"Current weight: %3.1f", 243.6);
	LogToUITextView(textView, @"Available fonts:n %@",
				[UIFont familyNames]);
}

And check it out: it actually works:

varargs-logging-function

I’ll probably adapt the varargs approach in my Obj-C logging function going forwards, but still, it’s nice to be able to make the procedural C call, especially since you could switch all NSLog calls to LogToUITextView with a single global replace.

Update: Here’s an even “more C” version that’s functionally equivalent:

void LogToUITextView (UITextView *view, NSString* format, ...) {
	va_list args;
	va_start (args, format);
	CFStringRef appendedText = CFStringCreateWithFormatAndArguments (
		kCFAllocatorDefault,
		NULL,
		(CFStringRef) format,
		args);
	CFStringRef newText = CFStringCreateWithFormat (
		kCFAllocatorDefault,
		NULL,
		(CFStringRef) @"%@%@n",
		view.text,
		appendedText);
	view.text = (NSString*) newText;
	CFRelease (newText);
	CFRelease (appendedText);
}

Obviously wordier, and we lose a convenient autorelease, since CoreFoundation doesn’t have autoreleasing.

An iPhone Core Audio brain dump

Twitter user blackbirdmobile just wondered aloud when the Core Audio stuff I’ve been writing about is going to come out. I have no idea, as the client has been commissioning a lot of work from a lot of iPhone/Mac writers I know, but has a lengthy review/rewrite process.

Right now, I’ve moved on to writing some beginner stuff for my next book, and will be switching from that to iPhone 3.0 material for the first book later today. And my next article is going to be on OpenAL. My next chance for some CA comes whenever I get time to work on some App Store stuff I’ve got planned.

So, while the material is still a little fresh, I’m going to post a stream-of-consciousness brain-dump of stuff that I learned along the way or found important to know in the course of working on this stuff.

  • It’s hard. Jens Alfke put it thusly:

    “Easy” and “CoreAudio” can’t be used in the same sentence. 😛 CoreAudio is very powerful, very complex, and under-documented. Be prepared for a steep learning curve, APIs with millions of tiny little pieces, and puzzling things out from sample code rather than reading high-level documentation.

  • That said, tweets like this one piss me off. Media is intrinsically hard, and the typical way to make it easy is to throw out functionality, until you’re left with a play method and not much else.

  • And if that’s all you want, please go use the HTML5 <video> and <audio> tags (hey, I do).

  • Media is hard because you’re dealing with issues of hardware I/O, real-time, threading, performance, and a pretty dense body of theory, all at the same time. Webapps are trite by comparison.

  • On the iPhone, Core Audio has three levels of opt-in for playback and recording, given your needs, listed here in increasing order of complexity/difficulty:

    1. AVAudioPlayer – File-based playback of DRM-free audio in Apple-supported codecs. Cocoa classes, called with Obj-C. iPhone 3.0 adds AVAudioRecorder (wasn’t sure if this was NDA, but it’s on the WWDC marketing page).
    2. Audio Queues – C-based API for buffered recording and playback of audio. Since you supply the samples, would work for a net radio player, and for your own formats and/or DRM/encryption schemes (decrypt in memory before handing off to the queue). Inherent latency due to the use of buffers.
    3. Audio Units – Low-level C-based API. Very low latency, as little as 29 milliseconds. Mixing, effects, near-direct access to input and output hardware.
  • Other important Core API’s not directly tied to playback and recording: Audio Session Services (for communicating your app’s audio needs to the system and defining interaction with things like background iPod player, ring/silent switch) as well as getting audio H/W metadata, Audio File Services for reading/writing files, Audio File Stream Services for dealing with audio data in a network stream, Audio Conversion Services for converting between PCM and compressed formats (and vice versa), Extended Audio File Services for combining file and conversion Services (e.g., given PCM, write out to a compressed AAC file).

  • You don’t get AVAudioPlayer or AVAudioRecorder on the Mac because you don’t need them: you already have QuickTime, and the QTKit API.
  • The Audio Queue Services Programming Guide is sufficient to get you started with Audio Queues, though it is unfortunate that its code excerpts are not pulled together into a complete, runnable Xcode project.

  • Lucky for you, I wrote one for the Streaming Audio chapter of the Prags’ iPhone book. Feel free to download the book’s example code. But do so quickly — the Streaming Audio chapter will probably go away in the 3.0 rewrite, as AVAudioRecorder obviates the need for most people to go down to the Audio Queue level. We may find some way to repurpose this content, but I’m not sure what form that will take. Also, I think there’s still a bug in the download where it can record with impunity, but can only play back once.

  • The Audio Unit Programming Guide is required reading for using Audio Units, though you have to filter out the stuff related to writing your own AUs with the C++ API and testing their Mac GUIs.

  • Get comfortable with pointers, the address-of operator (&), and maybe even malloc.

  • You are going to fill out a lot of AudioStreamBasicDescription structures. It drives some people a little batty.

  • Always clear out your ASBDs, like this:

    
    memset (&myASBD, 0, sizeof (myASBD))
    

    This zeros out any fields that you haven’t set, which is important if you send an incomplete ASBD to a queue, audio file, or other object to have it filled in.

  • Use the “canonical” format — 16-bit integer PCM — between your audio units. It works, and is far easier than trying to dick around bit-shifting 8.24 fixed point (the other canonical format).

  • Audio Units achieve most of their functionality through setting properties. To set up a software renderer to provide a unit with samples, you don’t call some sort of a setRenderer() method, you set the kAudioUnitProperty_SetRenderCallback property on the unit, providing a AURenderCallbackStruct struct as the property value.

  • Setting a property on an audio unit requires declaring the “scope” that the property applies to. Input scope is audio coming into the AU, output is going out of the unit, and global is for properties that affect the whole unit. So, if you set the stream format property on an AU’s input scope, you’re describing what you will supply to the AU.

  • Audio Units also have “elements”, which may be more usefully thought of as “buses” (at least if you’ve ever used pro audio equipment, or mixing software that borrows its terminology). Think of a mixer unit: it has multiple (perhaps infinitely many) input buses, and one output bus. A splitter unit does the opposite: it takes one input bus and splits it into multiple output buses.

  • Don’t confuse buses with channels (ie, mono, stereo, etc.). Your ASBD describes how many channels you’re working with, and you set the input or output ASBD for a given scope-and-bus pair with the stream description property.

  • Make the RemoteIO unit your friend. This is the AU that talks to both input and output hardware. Its use of buses is atypical and potentially confusing. Enjoy the ASCII art:

    
                             -------------------------
                             | i                   o |
    -- BUS 1 -- from mic --> | n    REMOTE I/O     u | -- BUS 1 -- to app -->
                             | p      AUDIO        t |
    -- BUS 0 -- from app --> | u       UNIT        p | -- BUS 0 -- to speaker -->
                             | t                   u |
                             |                     t |
                             -------------------------
    

    Ergo, the stream properties for this unit are

    Bus 0 Bus 1
    Input Scope: Set ASBD to indicate what you’re providing for play-out Get ASBD to inspect audio format being received from H/W
    Output Scope: Get ASBD to inspect audio format being sent to H/W Set ASBD to indicate what format you want your units to receive
  • That said, setting up the callbacks for providing samples to or getting them from a unit take global scope, as their purpose is implicit from the property names: kAudioOutputUnitProperty_SetInputCallback and kAudioUnitProperty_SetRenderCallback.

  • Michael Tyson wrote a vital blog on recording with RemoteIO that is required reading if you want to set callbacks directly on RemoteIO.

  • Apple’s aurioTouch example also shows off audio input, but is much harder to read because of its ambition (it shows an oscilliscope-type view of the sampled audio, and optionally performs FFT to find common frequencies), and because it is written with Objective-C++, mixing C, C++, and Objective-C idioms.

  • Don’t screw around in a render callback. I had correct code that didn’t work because it also had NSLogs, which were sufficiently expensive that I missed the real-time thread’s deadlines. When I commented out the NSLog, the audio started playing. If you don’t know what’s going on, set a breakpoint and use the debugger.

  • Apple has a convention of providing a “user data” or “client” object to callbacks. You set this object when you setup the callback, and its parameter type for the callback function is void*, which you’ll have to cast back to whatever type your user data object is. If you’re using Cocoa, you can just use a Cocoa object: in simple code, I’ll have a view controller set the user data object as self, then cast back to MyViewController* on the first line of the callback. That’s OK for audio queues, but the overhead of Obj-C message dispatch is fairly high, so with Audio Units, I’ve started using plain C structs.

  • Always set up your audio session stuff. For recording, you must use kAudioSessionCategory_PlayAndRecord and call AudioSessionSetActive(true) to get the mic turned on for you. You should probably also look at the properties to see if audio input is even available: it’s always available on the iPhone, never on the first-gen touch, and may or may not be on the second-gen touch.

  • If you are doing anything more sophisticated than connecting a single callback to RemoteIO, you may want to use an AUGraph to manage your unit connections, rather than setting up everything with properties.

  • When creating AUs directly, you set up a AudioComponentDescription and use the audio component manager to get the AUs. With an AUGraph, you hand the description to AUGraphAddNode to get back the pointer to an AUNode. You can get the Audio Unit wrapped by this node with AUGraphNodeInfo if you need to set some properties on it.

  • Get used to providing pointers as parameters and having them filled in by function calls:

    
    AudioUnit remoteIOUnit;
    setupErr = AUGraphNodeInfo(auGraph, remoteIONode, NULL, &remoteIOUnit);
    

    Notice how the return value is an error code, not the unit you’re looking for, which instead comes back in the fourth parameter. We send the address of the remoteIOUnit local variable, and the function populates it.

  • Also notice the convention for parameter names in Apple’s functions. inSomething is input to the function, outSomething is output, and ioSomething does both. The latter two take pointers, naturally.

  • In an AUGraph, you connect nodes with a simple one-line call:

    
    setupErr = AUGraphConnectNodeInput(auGraph, mixerNode, 0, remoteIONode, 0);
    

    This connects the output of the mixer node’s only bus (0) to the input of RemoteIO’s bus 0, which goes through RemoteIO and out to hardware.

  • AUGraphs make it really easy to work with the mic input: create a RemoteIO node and connect its bus 1 to some other node.

  • RemoteIO does not have a gain or volume property. The mixer unit has volume properties on all input buses and its output bus (0). Therefore, setting the mixer’s output volume property could be a de facto volume control, if it’s the last thing before RemoteIO. And it’s somewhat more appealing than manually multiplying all your samples by a volume factor.

  • The mixer unit adds amplitudes. So if you have two sources that can hit maximum amplitude, and you mix them, you’re definitely going to clip.

  • If you want to do both input and output, note that you can’t have two RemoteIO nodes in a graph. Once you’ve created one, just make multiple connections with it. The same node will be at the front and end of the graph in your mental model or on your diagram, but it’s OK, because the captured audio comes in on bus 1, and some point, you’ll connect that to a different bus (maybe as you pass through a mixer unit), eventually getting the audio to RemoteIO’s bus 0 input, which will go out to headphones or speakers on bus 0.

I didn’t come up with much (any?) of this myself. It’s all about good references. Here’s what you should add to your bookmarks (or Together, where I throw any Core Audio pages I find useful):

And you will know us by the trail of crash logs…

The last few weeks have been largely spent in Core Audio, which is surely coloring my perception of the iPhone SDK. It’s interesting talking to Daniel — author of the Prags’ Cocoa book as well as my editor on their iPhone title — as he’s working with Cocoa’s high-level abstractions, like the wonderful KVC/KVO, while I’m working at an extremely low level, down in Core Audio.

There’s no question it’s colored my perception of the iPhone SDK, to have spent pretty much a month doing mostly C between the streaming media chapter for the book and a Core Audio article for someone else. Over at O’Reilly, I blogged about the suprising primacy of C for serious iPhone development, and the challenges that presents for a generation that knows only C’s cleaned-up successors, like Java and C# (to say nothing of the various scripting languages). At least one of the responses exhorted readers to grow a pair and read K&R, but the more I’ve thought about it, the more I think that may be a bad suggestion. K&R was written for the Unix systems programmer of the late 70’s and early 80’s. It doesn’t cover C99, and many of the standards of that time are surely out of date (for example, why learn 8-bit ASCII null-terminated strings, when the iPhone and Mac programmer should be using Unicode-friendly NSStrings or CFStringRefs). This is an interesting problem, one which I’ll have more to say about later…

The streaming media chapter clocks in around 35 pages. Daniel wondered if it might be too inclusive, but I think the length just comes from the nature of Core Audio: involved and verbose. The chapter really only addresses three tasks: recording with an Audio Queue, playing with an Audio Queue (which is less important now that we have AVAudioPlayer, but which is still needed for playing anything other than local files), and converting between formats. On the latter, there’s been precious little written in the public eye: Googling for ExtAudioFileCreateWithURL produces a whopping 16 unique hits. Still, there’s a risk that this chapter is too involved and of use to too few people… it’ll be in the next beta and tech review, but it might not suit the overall goals of the book. If we cut it, I’ll probably look to repurpose it somehow (maybe I can pitch the “Big Ass Mac and iPhone Media book” dream project, the one that covers Core Audio, Core Video, and QTKit).

The article goes lower than Audio Queue and Extended Audio Files, down to the RemoteIO audio unit, in order to get really low-latency audio. MIchael Tyson has a great blog on recording with RemoteIO, but for this example, I’m playing low-latency audio, by generating samples on the fly (I actually reused some sine wave code from the QuickTime for Java book, though that example wrote samples to a file whereas this one fills a 1 KB buffer for immediate playback).

Amusingly, after switching from easy-to-compute square waves to nicer sounding sine waves, I couldn’t figure out why I wasn’t getting sound… until I took out a logging statement and it started working. Presumably, the expense of the logging caused me to miss the audio unit’s deadlines.

Working at this level has me rethinking whether a media API of this richness and power could ever have worked in Java. It’s not just Sun’s material disinterest and lack of credibility in media, it’s also the fact that latency is death at the low levels that I’m working in right now, and there’s no user who would understand why their audio had pauses and dropouts because the VM needed to take a break for garbage collection. If Java ever did get serious about low-latency media, would we have to assume use of real-time Java?

I’m amazed I haven’t had more memory related crashes than I have. I felt dirty using pointer math to fill a buffer with samples, but it works, and that’s the right approach for the job and the idioms of C and Core Audio. After a month of mostly C, I think I’m getting comfortable with it again. After I struggled with this stuff a year ago, it’s getting a lot easier. When I have time, maybe I’ll start over on the web radio client and actually get it working.

Next up: finishing the low-latency article, fixing an unthinkable number of errata on the book (I haven’t looked in a while, and I dread how much I’ll need to fix), then onto AU Graph Services and mixing.

I didn’t know 1718449215 was the 4CC for “fmt?” I do now.

You know things are going badly when you get errors so cryptic and so consistently that you write yourself a pretty-print method to make sense of them.

Here’s where I was as of last night:


- (void) failTo: (NSString*) functionName withOSStatus: (OSStatus) stat {
	NSError *error = [NSError errorWithDomain:NSOSStatusErrorDomain
			code:stat userInfo:nil];
	NSLog (@"Error in %@: %@", functionName, [error description]);
}

Which allows me to catch errors like this:


if (audioErr != noErr) {
	[self failTo: @"AudioQueueNewInput" withOSStatus: audioErr];
	return;
}

And which produces output like this. A lot.

2009-02-03 21:20:18.874 AQRecorderThrowaway[3522:20b] Error in AudioQueueNewInput: Error Domain=NSOSStatusErrorDomain Code=1718449215 "Operation could not be completed. (OSStatus error 1718449215.)"

And a little searching through the Audio Queue Services documentation tells us that 1718449215 is is the four char code for fmt?, also known as kAudioFormatUnsupportedDataFormatError.

Opt-in Complexity

Last month, Erica posted a blog heralding the introduction of AVAudioPlayer in iPhone OS (and SDK) 2.2. She writes:

When the SDK finally did roll around, its Audio Queue approach to handling audio playback and recording proved to be an extreme disappointment. A quick glance through the Audio Queue Services programming guide reveals both the power and complexity of the service. Involving pages of low-level programming just for even the simplest audio requests, Audio Queues are perfect for serious performance-driven developers but lack the easy-to-use hooks that Celestial had provided. With Celestial, you could load a URL and then just play it.

Erica makes an excellent point here that gets overlooked: Audio Queue Services is powerful, as well as complex. Granted, with audio, we have something of an 80/20 scenario: presumably, about 80% of the iPhone developers with any use for audio need only about 20% of the Audio Toolbox’s functionality (namely, playback, volume and pan controls, and maybe level metering). So they’re probably very happy to have AVAudioPlayer.

But what about the other 20%? There’s a group of audio developers for whom simple playback is not enough. These are the guys and gals who want to:

  • Stream audio from the network
  • Pick out Shoutcast metadata from said stream
  • Apply effects
  • Inspect the audio format
  • Inspect metadata
  • Edit
  • Convert between formats
  • Perform monitoring other than peak/average power level
  • Perform arbitrary DSP (e.g., FFT frequency matching for a Karaoke Revolution / Rock Band type game)

Now how are you going to design an API to make them happy, while not drowning the basic developer with a hundred method signatures they won’t be able to make heads or tails of?

Intriguingly, Apple’s APIs on the Mac and iPhone largely don’t even try. Instead, the complex stuff gets segregated down to the lower levels of the SDK stack — Core Media and Core Services — while Cocoa and Cocoa Touch’s higher-level abstractions provide the most broadly used functionality.

In the case of audio, that means the developer with simple playback needs can stay in Obj-C and use AVAudioPlayer and not worry about things like latency or effects. When he or she is ready to opt in to more complexity, the first step is to use the C-based Audio Session API to describe how the app interacts with the rest of the system (can it mix its sounds with music playing from the iPod app, for example… does it want to be notified when the output path chances, like when the user removes the headphones, etc.). And if the developer needs more power, then they choose complexity and move on to Audio Toolbox (or perhaps even Core Audio… a DevForums thread and a blog by developer Michael Tyson report extremely low latency by using the RemoteIO audio unit directly).

This isn’t just true of the media APIs. You also see it in Foundation versus Core Foundation. The first chapter I did for the Pragmatic Programmers’ iPhone book was an omnibus I/O chapter (which later became separate chapters on file and network I/O), and while working on the networking portion, I wrote an example that used Cocoa’s NSHost class and NSStream‘s getStreamsToHost:port:inputStream:outputStream: method. It worked fine on the simulator, but started giving compiler warnings when I finally got my certificate. Search for the method in the documentation and switch between the Mac OS X and iPhone Doc Sets to see the problem: NSHost and getStreamsToHost:port:inputStream:outputStream: are not part of the public iPhone API (a hint of the reason why is on DevForums). Hilariously, it was only after I’d gone on to rewrite it with the lower-level, procedural-C CFNetwork that I decided to take a step back and say “you know what, the Obj-C URL Loading System is going to be enough for 80-90% of our readership’s networking needs.” Again, the functionality of opening a stream to an arbitrary port on an arbitrary host is there, but if you’re the 1 in 10 developers who really really needs to do that, then you’re going down to CFNetwork and using something like CFStreamCreatePairWithSocketToHost().

Need time zone awareness? NSTimeZone is your friend. Need to know every time zone that the device supports? Get to know CFTimeZoneCopyKnownNames(). Again, a niche-ier feature lives down at the Core Foundation level, and isn’t wrapped by an equivalent call in Foundation, though it’s easy enough to switch to procedural C and make the one-off lower-level call.

It’s an interesting trait that the Mac and iPhone stacks work this way, opting in to complexity and keeping the higher-level APIs sparser and simpler, and you have to wonder whether it’s a conscious design decision or a happy accident. After all, a key reason to put so much functionality in the lower-level procedural-C layers — aside from performance benefits from not having to do Obj-C message dispatch — is that these C APIs can be called equally easily from Carbon or Cocoa apps. But of course, the whole idea of Carbon/Cocoa compatibility is irrelevant on the iPhone, where Carbon is nonexistent. In a purely iPhone world, the only reason to have the complex stuff be C-only is to move the tricky, nichey, sophisticated stuff out of the way, optimizing the Obj-C APIs for the most common uses.

Advantage: it does make Cocoa a pleasure to work with. Disadvantage: non-trivial apps are almost surely going to need to make these low-level calls sooner or later, and switching between Obj-C and procedural C on a line-by-line basis takes some getting used to. Still, making the complex stuff an opt-in ultimately makes the SDK both more approachable and more interesting.

[Cross-posted to O’Reilly’s Inside iPhone]

Writing about things I can’t write about

The iPhone SDK NDA is really cutting into my ability to write about what I’m actually working on outside of the usual java.net editing gig.

To wit: I just completed the first of four iPhone SDK articles, but of course I can’t say whom it’s for or what it’s about. I may not even be able to link to it when it’s done. Which is a shame because it’s way attractive. See if you can find a very quiet reference to it on the reworked Subsequently & Furthermore home page, now with much more iPhone content (since that’s the kind of freelancing work I’m hoping to attract).

For my own projects, I’m digging into the OpenAL support right now. You have to understand both the OpenAL API and make some use of the Audio Toolbox, for reasons that are presumably also off-limits given the NDA. Suffice to say that much of the OpenAL sample code on the net won’t work, thanks to the deprecation (and what’s the next step after that… anyone?) of certain convenience (crutch?) functions that maybe shouldn’t have been part of OpenAL in the first place. So, you use Audio Toolbox, which is a pretty attractive API in its own right.

At some point I’m going to get back into the net radio code for iPhone. Some of my Core Foundation work that I did for the book (and then threw away, since a Cocoa alternative was available) makes me understand some of my inexplicable errors that hung me up a few months ago. To wit, if myBuffers is a CFArrayRef instead of an old-fashioned C array, then referencing myBuffers[i] is a very, very bad idea (the correct call is CFArrayGetValueAtIndex(i)).

But having said that, there are enough web radio apps already to make me not want to do yet another, to say nothing of the legal burden of licensing a stream-finder, or (ick) hosting my own. I might have to move on to the harder audio app idea that I can’t shake, but haven’t committed to paper prototype yet.

Speaking of iPhone audio apps, I have nothing to add to the controversy over the rejection of Podcaster except to say that I would have expected Apple to backtrack and OK the app by close-of-business today, since the decision is so obviously wrong and harmful to the platform as a whole. Giving serious developers second thoughts about developing for the iPhone, if not sending them fleeing to the exits, is probably not in Apple’s self-interest.

I’m inclined to think it’s just a case of working out the kinks in the App Store: when it opened, there were howls of derision for Apple letting in junk like the hundreds of public-domain books wrapped in a trivial reader, the “flashlight” apps, or the $1,000 “I Am Rich” app. Now they’ve gone too far the other way, but rather than reject Podcaster for being junk, which it’s not, the stated reason is that it supposedly competes with Apple’s built-in iPhone functionality (not even true because Podcaster can fetch podcasts while you’re mobile, a nice feature when you take your iPhone on a trip and leave behind the Mac Pro it’s synched with).

Again, I’m not freaking because it’s so obviously, wildly wrong, that I think Apple will quietly make things right.

Of course, I had confidence the inexplicable, unenforceable NDA would have been lifted by now, so what do I know?