Archives for : January2009

Muxing and chunking

Interesting bits from the lists, which I really ought to read more often.

QuickTime 7.6’s support document has a blunt and ambiguous little feature, “Audio tracks from MPEG video files now export consistently.” What does this mean? According to quicktime-users, you can export muxed MPEG-1 and not lose your audio, something Ben Waggoner notes we’ve been waiting 12 years for. Seriously, I could have used this like four companies ago at VNI… maybe then we’d have used QTJ instead of JMF.

The issue is this: “muxed” MPEG-1 means that audio and video samples are interleaved. That means when you play back, you generally have the samples you need for a given time all in one place (i.e., adjacent in a stream, in contiguous disk sectors, etc.). This is handy, of course, but QuickTime has its own idea about how samples are supposed to be organized (in a QuickTime movie, there are “chunks” of samples, and tables to optimize looking up chunks for a given time). Since MPEG-1 didn’t use that organization, QuickTime’s support for MPEG-1 was less robust than it was for other formats (notably, you don’t have these problems in MPEG-4, whose container format is highly .mov-like).

Great that they’ve taken care of this, but how relevant is MPEG-1 anymore, particularly as a production format where you’d care to take MPEG-1 as a source and then need to export out of it?

Similar and interesting bit from the coreaudio-api list. Jens Alfke noted slow seeking with MP3s. Again, the reason is that those files don’t have packet/chunk tables, so seeking to an arbitrary time in VBR data requires reading and decoding all the data up to that point. Ouch! Equally intersting in the followup from Apple’s Jeff Moore is that Core Audio’s AudioFile builds its own packet table as it seeks, so you don’t experience this problem when you seek back to an earlier point, or to any other point that’s been played or jumped to.

Better yet, there’s an easier workaround: put your audio data into a container format that has those kinds of tables, like MPEG-4 or Core Audio Format (.caf). Seems like .mov should work too, but Jeff didn’t mention it explicitly.

Interestingly, you can see this in the sample app I wrote for the next beta of the Prags’ iPhone book. Once beta 10 comes out, get the sample code, load the SimpleAudioPlayer project, add an MP3 file to the project (must be named audio.mp3) and delete the audio.m4a reference. Build and go.

While playing, use the slider to jump most of the way into your song. Especially on the device, you’ll notice a lag of 1-2 seconds while it seeks to that location. But then seek back earlier into the song, and then back again to any point before your first jump, and it’s instantaneous.

I didn’t make a big deal of the CAF format in the chapter, primarily because when I used Sound Studio to create sample files for the System Sounds example, the AIFF files sounded right and the CAFs had weird dropouts and stuttering. For the sake of expedience, I used the AIFFs for that example. But given some of CAFs advantages, most notably its packet tables and its codec agnosticism, it might become more prominent in later rewrites.

Link: Calls for open source government

Sun co-founder Scott McNealy, quoted in BBC article Calls for open source government:

The government ought to mandate open source products based on open source reference implementations to improve security, get higher quality software, lower costs, higher reliability – all the benefits that come with open software.

I can’t think of a more damning statement for a business model than to say that governments should force people to use it.

It was obscene when tried to establish its business model as law in several states (including Michigan, embarrassingly), it’s obscene when Old Media insists that laws be crafted to suit its broken business models… so where’s the outrage in response to this? If anything, the OSS community should be worried that OSS’ benefits aren’t self-evident, and that it therefore needs to be propped up by the force of law.

Dumb, dumb, dumb, dumb, dumb.

Link: Steve Jobs: A Tough Act to Follow

I’m not terribly interested in most of the stories speculating about Steve Jobs’ health or the company’s outlook — do you really think Steve designed the Cocoa APIs or wrote the iPhone’s power management software — but I do think one valuable analogy you can make is to how Disney at first stumbled but later pressed on without its hands-on namesake founder.

Jim Hill Media contributor and long-time Disney artist Floyd Norman is in a unique position to make this comparison: he worked for Disney on The Jungle Book, and later for Jobs at Pixar. So I’m highly willing to listen to his take on how the two stories played out. Plus, his sketches of Jobs are a hoot: Steve Jobs: A Tough Act to Follow.

Opt-in Complexity

Last month, Erica posted a blog heralding the introduction of AVAudioPlayer in iPhone OS (and SDK) 2.2. She writes:

When the SDK finally did roll around, its Audio Queue approach to handling audio playback and recording proved to be an extreme disappointment. A quick glance through the Audio Queue Services programming guide reveals both the power and complexity of the service. Involving pages of low-level programming just for even the simplest audio requests, Audio Queues are perfect for serious performance-driven developers but lack the easy-to-use hooks that Celestial had provided. With Celestial, you could load a URL and then just play it.

Erica makes an excellent point here that gets overlooked: Audio Queue Services is powerful, as well as complex. Granted, with audio, we have something of an 80/20 scenario: presumably, about 80% of the iPhone developers with any use for audio need only about 20% of the Audio Toolbox’s functionality (namely, playback, volume and pan controls, and maybe level metering). So they’re probably very happy to have AVAudioPlayer.

But what about the other 20%? There’s a group of audio developers for whom simple playback is not enough. These are the guys and gals who want to:

  • Stream audio from the network
  • Pick out Shoutcast metadata from said stream
  • Apply effects
  • Inspect the audio format
  • Inspect metadata
  • Edit
  • Convert between formats
  • Perform monitoring other than peak/average power level
  • Perform arbitrary DSP (e.g., FFT frequency matching for a Karaoke Revolution / Rock Band type game)

Now how are you going to design an API to make them happy, while not drowning the basic developer with a hundred method signatures they won’t be able to make heads or tails of?

Intriguingly, Apple’s APIs on the Mac and iPhone largely don’t even try. Instead, the complex stuff gets segregated down to the lower levels of the SDK stack — Core Media and Core Services — while Cocoa and Cocoa Touch’s higher-level abstractions provide the most broadly used functionality.

In the case of audio, that means the developer with simple playback needs can stay in Obj-C and use AVAudioPlayer and not worry about things like latency or effects. When he or she is ready to opt in to more complexity, the first step is to use the C-based Audio Session API to describe how the app interacts with the rest of the system (can it mix its sounds with music playing from the iPod app, for example… does it want to be notified when the output path chances, like when the user removes the headphones, etc.). And if the developer needs more power, then they choose complexity and move on to Audio Toolbox (or perhaps even Core Audio… a DevForums thread and a blog by developer Michael Tyson report extremely low latency by using the RemoteIO audio unit directly).

This isn’t just true of the media APIs. You also see it in Foundation versus Core Foundation. The first chapter I did for the Pragmatic Programmers’ iPhone book was an omnibus I/O chapter (which later became separate chapters on file and network I/O), and while working on the networking portion, I wrote an example that used Cocoa’s NSHost class and NSStream‘s getStreamsToHost:port:inputStream:outputStream: method. It worked fine on the simulator, but started giving compiler warnings when I finally got my certificate. Search for the method in the documentation and switch between the Mac OS X and iPhone Doc Sets to see the problem: NSHost and getStreamsToHost:port:inputStream:outputStream: are not part of the public iPhone API (a hint of the reason why is on DevForums). Hilariously, it was only after I’d gone on to rewrite it with the lower-level, procedural-C CFNetwork that I decided to take a step back and say “you know what, the Obj-C URL Loading System is going to be enough for 80-90% of our readership’s networking needs.” Again, the functionality of opening a stream to an arbitrary port on an arbitrary host is there, but if you’re the 1 in 10 developers who really really needs to do that, then you’re going down to CFNetwork and using something like CFStreamCreatePairWithSocketToHost().

Need time zone awareness? NSTimeZone is your friend. Need to know every time zone that the device supports? Get to know CFTimeZoneCopyKnownNames(). Again, a niche-ier feature lives down at the Core Foundation level, and isn’t wrapped by an equivalent call in Foundation, though it’s easy enough to switch to procedural C and make the one-off lower-level call.

It’s an interesting trait that the Mac and iPhone stacks work this way, opting in to complexity and keeping the higher-level APIs sparser and simpler, and you have to wonder whether it’s a conscious design decision or a happy accident. After all, a key reason to put so much functionality in the lower-level procedural-C layers — aside from performance benefits from not having to do Obj-C message dispatch — is that these C APIs can be called equally easily from Carbon or Cocoa apps. But of course, the whole idea of Carbon/Cocoa compatibility is irrelevant on the iPhone, where Carbon is nonexistent. In a purely iPhone world, the only reason to have the complex stuff be C-only is to move the tricky, nichey, sophisticated stuff out of the way, optimizing the Obj-C APIs for the most common uses.

Advantage: it does make Cocoa a pleasure to work with. Disadvantage: non-trivial apps are almost surely going to need to make these low-level calls sooner or later, and switching between Obj-C and procedural C on a line-by-line basis takes some getting used to. Still, making the complex stuff an opt-in ultimately makes the SDK both more approachable and more interesting.

[Cross-posted to O’Reilly’s Inside iPhone]