Opt-in Complexity

Last month, Erica posted a blog heralding the introduction of AVAudioPlayer in iPhone OS (and SDK) 2.2. She writes:

When the SDK finally did roll around, its Audio Queue approach to handling audio playback and recording proved to be an extreme disappointment. A quick glance through the Audio Queue Services programming guide reveals both the power and complexity of the service. Involving pages of low-level programming just for even the simplest audio requests, Audio Queues are perfect for serious performance-driven developers but lack the easy-to-use hooks that Celestial had provided. With Celestial, you could load a URL and then just play it.

Erica makes an excellent point here that gets overlooked: Audio Queue Services is powerful, as well as complex. Granted, with audio, we have something of an 80/20 scenario: presumably, about 80% of the iPhone developers with any use for audio need only about 20% of the Audio Toolbox’s functionality (namely, playback, volume and pan controls, and maybe level metering). So they’re probably very happy to have AVAudioPlayer.

But what about the other 20%? There’s a group of audio developers for whom simple playback is not enough. These are the guys and gals who want to:

  • Stream audio from the network
  • Pick out Shoutcast metadata from said stream
  • Apply effects
  • Inspect the audio format
  • Inspect metadata
  • Edit
  • Convert between formats
  • Perform monitoring other than peak/average power level
  • Perform arbitrary DSP (e.g., FFT frequency matching for a Karaoke Revolution / Rock Band type game)

Now how are you going to design an API to make them happy, while not drowning the basic developer with a hundred method signatures they won’t be able to make heads or tails of?

Intriguingly, Apple’s APIs on the Mac and iPhone largely don’t even try. Instead, the complex stuff gets segregated down to the lower levels of the SDK stack — Core Media and Core Services — while Cocoa and Cocoa Touch’s higher-level abstractions provide the most broadly used functionality.

In the case of audio, that means the developer with simple playback needs can stay in Obj-C and use AVAudioPlayer and not worry about things like latency or effects. When he or she is ready to opt in to more complexity, the first step is to use the C-based Audio Session API to describe how the app interacts with the rest of the system (can it mix its sounds with music playing from the iPod app, for example… does it want to be notified when the output path chances, like when the user removes the headphones, etc.). And if the developer needs more power, then they choose complexity and move on to Audio Toolbox (or perhaps even Core Audio… a DevForums thread and a blog by developer Michael Tyson report extremely low latency by using the RemoteIO audio unit directly).

This isn’t just true of the media APIs. You also see it in Foundation versus Core Foundation. The first chapter I did for the Pragmatic Programmers’ iPhone book was an omnibus I/O chapter (which later became separate chapters on file and network I/O), and while working on the networking portion, I wrote an example that used Cocoa’s NSHost class and NSStream‘s getStreamsToHost:port:inputStream:outputStream: method. It worked fine on the simulator, but started giving compiler warnings when I finally got my certificate. Search for the method in the documentation and switch between the Mac OS X and iPhone Doc Sets to see the problem: NSHost and getStreamsToHost:port:inputStream:outputStream: are not part of the public iPhone API (a hint of the reason why is on DevForums). Hilariously, it was only after I’d gone on to rewrite it with the lower-level, procedural-C CFNetwork that I decided to take a step back and say “you know what, the Obj-C URL Loading System is going to be enough for 80-90% of our readership’s networking needs.” Again, the functionality of opening a stream to an arbitrary port on an arbitrary host is there, but if you’re the 1 in 10 developers who really really needs to do that, then you’re going down to CFNetwork and using something like CFStreamCreatePairWithSocketToHost().

Need time zone awareness? NSTimeZone is your friend. Need to know every time zone that the device supports? Get to know CFTimeZoneCopyKnownNames(). Again, a niche-ier feature lives down at the Core Foundation level, and isn’t wrapped by an equivalent call in Foundation, though it’s easy enough to switch to procedural C and make the one-off lower-level call.

It’s an interesting trait that the Mac and iPhone stacks work this way, opting in to complexity and keeping the higher-level APIs sparser and simpler, and you have to wonder whether it’s a conscious design decision or a happy accident. After all, a key reason to put so much functionality in the lower-level procedural-C layers — aside from performance benefits from not having to do Obj-C message dispatch — is that these C APIs can be called equally easily from Carbon or Cocoa apps. But of course, the whole idea of Carbon/Cocoa compatibility is irrelevant on the iPhone, where Carbon is nonexistent. In a purely iPhone world, the only reason to have the complex stuff be C-only is to move the tricky, nichey, sophisticated stuff out of the way, optimizing the Obj-C APIs for the most common uses.

Advantage: it does make Cocoa a pleasure to work with. Disadvantage: non-trivial apps are almost surely going to need to make these low-level calls sooner or later, and switching between Obj-C and procedural C on a line-by-line basis takes some getting used to. Still, making the complex stuff an opt-in ultimately makes the SDK both more approachable and more interesting.

[Cross-posted to O’Reilly’s Inside iPhone]

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.