Archives for : September2010

Less than 19,000 words about Audio Units

This morning, I sent off first drafts of chapters 7 and 8 of the Core Audio book to our esteemed editor, Chuck Toporek. It’s the first new material he’s received in almost two months, but it’s not like we’ve been slacking off. You see, this was supposed to be just one chapter…

If you look in the table of contents, you’ll see that chapter 7 is about Audio Units. Chapter 8 is about OpenAL. Well, it was. Until chapter 7 grew and grew and grew, until it was longer than chapters 4, 5, and 6 combined. At that point, it became obvious that it was way too big to be one chapter, so we split it in two, and pushed everything after it out by one chapter.

So that’s the administrative details, but… why? Why the hell did I write a 19,000-word chapter? Suffice to say, Audio Units is big. Arguably, it’s the heart and soul of Core Audio. It’s the “engine” API (in my terminology, and to contrast it with utility APIs that do stuff like file I/O or format conversion) that the other engines (Audio Queues and OpenAL) are built on top of. It’s also the secret sauce that allows for Core Audio to offer very low-latency audio processing, a rich library of effects, and a third-party market in units to do effects, synthetic instruments, and more.

It’s also the hardest part of an already crazy-hard framework. And to my mind, that justifies really digging into it: the whole point of buying a book is to get some help with the hard parts.

Now, a bit of background as to how we got here. When I came on to the book, Kevin and Mike had fragments of three chapters, along with a few example programs for the audio queue chapters. I reused as much of their existing material as I could in the first part of the book, moving it around and working to make their voice and mine mesh. I also worked examples into the first three chapters, because I thought it was important to get readers looking at code and playing with samples and properties early. While I was writing, Kevin got three new example projects created for the units chapter: a file player, a speech synthesizer, and a sine wave (which doesn’t sound as cool, but it illustrates the concept of having Core Audio do “render callbacks” to get samples from your code, so it’s actually more useful).

With those, it was already going to be a long chapter, but I thought we were missing out by not addressing capture at the Audio Unit level, so I set about to write an example project for that. As it turns out, I was naive about this particular example, because while I’d done some elaborate capture stuff on iOS (see What You Missed at 360iDev), play-through is a lot harder on Mac OS X because you literally have two different audio devices, with different I/O cycles and different threads servicing them, so you can’t have the output unit just pull samples on demand from the input unit whenever it needs to, like you can on iOS. Instead, there are a bunch of extra steps involving discovering the available audio devices, connecting one to an AUHAL (an Audio Unit that speaks to the Hardware Abstraction Layer), and sharing input from that data asynchronously with the rest of the audio-processing graph via a ring buffer.

These chapters kind of can’t help but be long, involved exercises in “write this because we have to deal with this, write that because of that other thing.” I actually think it’s something of an improvement over Apple’s documentation, as the Apple way is to provide programming guides that aren’t complete examples (just the crucial sections), and ambitious sample code (particularly the WWDC apps) that run thousands of lines and bury the dozen or so that really matter.

Anyways, with these four examples, the first draft weighed in at 19,000 words, as compared to the 4,000 – 6,000 that we’ve been doing in our other chapters. I feared that readers would try to take it all in at once and be completely overwhelmed, and Chuck agreed that splitting in two was justified. We have some reworking to do elsewhere as a side-effect of this: the table of contents and introductory road map have to change, we’ll probably change some chapter titles to match how we’re presenting the audio units stuff, etc.

But at the end of the day, you’re getting four kick-ass audio unit walkthroughs. Plus a fifth when we get to the iOS chapter. And we still have creating your own units coming in the last chapter.

Nice to have this part done… I think it is going to be the hardest stuff in the book, meaning it starts to be a downhill ride for me from here.

Now if you’ll excuse me, I need to take a week or two away from Core Audio and switch to AV Foundation to get my talk ready for Voices That Matter: iPhone Developer Conference in Philadelphia in two weeks.

More things in Heaven and Earth

There are more things in Heaven and Earth, Horatio; than are dreamt of in your philosophy.
Hamlet, Act 1, Scene V

There’s suddenly a lot of conventional wisdom that says the rise and eventual dominance of Android is manifest, and inevitable. Some of these claims make dubious analogies to Windows’ defeat of the Mac in the 90’s, ramming square pegs through round holes to make the analogy stick (to wit: who are the hardware manufacturers this time, the handset makers or the carriers). It may indeed come to pass, but the reasoning behind these claims is pretty shallow thusfar.

Case in point: an Appcelerator survey covered in The Apple Blog story Devs Say Android is Future-Proof. iOS? Not So Much. The reasoning for Android’s perceived advantage? This article doesn’t mention Android’s license terms and widespread hardware adoption (maybe that’s taken for granted at this point?), and instead mentions only the appeal of writing apps for GoogleTV, a product that is not even out yet (meaning Adamson’s First Law applies), to say nothing of how many purported “interactive television revolutions” we’ve suffered through over the decades (Qube, videotex, WebTV, Tru2Way, etc.). Maybe it’ll be the next big thing, but history argues otherwise.

In the 90’s, the rise of Java seemed an obvious bet. Applets would make web pages far more compelling than static pages and lengthy form submits, and application developers would surely be better off with garbage collection and strong typing than with C and C++. Java was so sure to be big, that Microsoft threw the full force of its dirty tricks machine at it, while Apple exposed most of the Mac’s unique libraries to Java bindings (including, at various times, QuickTime, Cocoa, Core Audio, speech, and more). But it didn’t work out that way: Java on the browser was displaced by JavaScript/Ajax, and the early attempts to write major desktop applications in Java were unmitigated disasters, with the Netscape Navigator port abandoned, and Corel’s Java version of Word Perfect Office was buried almost immediately after it was released. 1996’s sure bet was a has-been (or a never-was) by 2001.

If you think about it, the same thing happened a few years ago with AIR. With the YouTube-powered rise of Flash, AIR seemed a perfect vehicle to bring hordes of Flash developers to the desktop. Everyone knew it would be big. Except it wasn’t. AIR applications are rare today, perhaps rarer even than Java. Admittedly, I only remembered of AIR’s existence because I needed to download the AIR-powered Balsamiq application for a client this week… exception that proves the rule, I guess?

My point in all this is that the conventional wisdom about platform success has a tendency to be selective in considering what factors will make or break a platform. Licensing, corporate support, community, and of course the underlying technology all play a part. Android is greatly enhanced by the fact that Google puts talented people behind it and then gives it away, but if carriers then use it to promote their own applications and crapware over third-party apps (or cripple them, as they did with JavaME), then Android’s advantage is nil. On the other hand, Apple’s iOS may have remarkable technology, but if their model requires using their corporate strength to force carriers to be dumb pipes, then they may only be able to get iPhone on weaker carriers, which will turn off consumers and retard growth of the platform.

Ultimately, it’s hard to say how this will all play out, but assuming an Android victory based on the presumed success of currently non-existent tablets and set top boxes is surely an act of faith… which probably accounts for all the evangelism.

So why am I on iOS now? Is it because I have some reason to think that it will “win”? Not at all. Mostly it’s because I like the technology. In the mid 2000’s, when user-facing Java was in terminal decline, I tried to learn Flash and Flex to give myself more options, but I just couldn’t bring myself to like it. It just didn’t click for me. But as I got into Cocoa and then the iPhone SDK, I found I liked the design patterns, and the thoughtfulness of all of it. The elegance and power appealed to me. Being a media guy, I also appreciate the platform’s extraordinary support for audio and video: iOS 4 has three major media APIs (AV Foundation, Core Audio, and Media Player), along with other points of interest throughout the stack (video out in UIKit, the low-level abstractions of Core Media, spatialized sound in OpenAL, high-performance DSP functions in the Accelerate framework, etc.). The package is quite limited by comparison, offering some canned functionality for media playback and a few other curious features (face recogniation and dial tone generation, for example), but no way to go deeper. When so many media apps for Android are actually server-dependent, like speech-to-text apps that upload audio files for conversion, it says to me there’s not much of a there there, at least for the things I find interesting.

Even when I switched from journalism and failed screenwriting to programming and book-writing in the late 90’s, at the peak of the Microsoft era, I never considered for a second the option of learning Windows programming and adopting that platform. I just didn’t like their stuff, and still don’t. The point being that I, and you, don’t have to chase the market leader all the time. Go with what you like, where you’ll be the most productive and do the most interesting work.

There’s a bit in William Goldman’s Adventures in the Screen Trade (just looked in my copy, but couldn’t find the exact quote), where the famous screenwriter excuses himself from a story meeting, quitting the project by saying “Look, I am too old, and too rich, to have to put up with this shit.” I like the spirit of that. Personally, I may not be rich, but I’m certainly past the point where I’m willing to put up with someone else’s trite wisdom, or the voice of the developer mob, telling me where I should focus my skills and talents.

The iPad File-asco

This all started with this week’s update to the iWork apps for iPad, which added support for plaintext (.txt) documents. It allowed me to believe, for a time, that I could use some of my idle time around the house, when I’m away from the computer, to work on the Core Audio book. I was also inspired by Noel Llopis’ standing desk experiment, an iblogdevaday post in which he praises the health benefits of working while standing up for long stretches. I’m not keen to overhaul my home office, but the idea of using a remote keyboard while standing at the kitchen counter seems well worth trying.

The problem, it seems, is with getting files in and out of the iPad for editing. The default system of dragging files into the iTunes “Apps” tab is remarkably clumsy and seems atypically half-assed for an Apple user experience. If only we could see each app’s files in the Finder… ah, but it apparently is an article of faith at Apple that the iOS filesystem is never to be exposed to users, no matter how bad the alternative is.

Some sort of over-the-air retrieval of files is Plan B, but this isn’t built into iOS, so apps need to provide their own support. Seeing that Pages could retrieve files with WebDAV got me thinking that could be an option: I’d set up WebDAV on my desktop, grab a chapter file, edit it for a few minutes, and save it back. It took me a long time to get Apache even a little happy with exposing a second checkout of the book (I had file-ownership problems with my original directory and didn’t want to deal with giving the _www user write access to it), but I finally got the plaintext ch07.txt, the enormous chapter on audio units, into Pages on the iPad…

Unfortunately, it’s a pretty hollow victory: Pages won’t write plaintext, and forces me to save as .pages, .pdf, or .doc, which means converting back to plaintext when I return to my desk. So this is starting to seem like more hassle than it’s worth for quick, on-the-go editing.

Maybe there’s a solution in the form of some other app. Unfortunately, the reviews for iPad text editing apps are largely poor, and few seem to have WebDAV or other over-the-air file-sharing support (although a few support FTP, being aimed at webmasters who might need to fix a site on the go). Another option is to look at the file-sharing apps instead: Air Sharing HD looks like it would solve the file sharing problem easily, but is mostly a file reader and doesn’t indicate that it has support for editing text files.

So, that’s where things are right now. I’d like to get plaintext files off my Mac, edit them on the iPad, and put them back. This doesn’t seem like it should be so hard. Am I overlooking a good solution?

Jimmy Gosling Said (I’m Irrelevant When You Compile)

Much renewed hope and delight last week, after Apple pulled back from its most audacious and appalling land-grabs in its iOS developer agreement, notably the revised section 3.3.1 that prohibited any languages but C, Objective-C, and C++ for iOS development (Daring Fireball quotes the important changes in full). Whether this is the result of magnanimity or regulatory pressures in the U.S. and Europe is unknowable and therefore unhelpful. What’s interesting is thinking about what we can do with our slightly-loosened shackles.

For example, it would be interesting to see if someone, perhaps a young man or lady with a mind for mischief or irony, could bring Google’s Go programming language to iOS development. Since the Go SDK runs on Mac and can compile for ARM, it might well be possible to have a build script call the Go compiler as needed. And of all the hot young languages, Go might be the most immediately applicable, as it is compiled, rather than interpreted.

And that brings up the other major change, the use of interpreters. Nobody seems to be noting that the change in section 3.3.2 is not just a loosening of this Spring’s anti-Flash campaign, but is in fact far more lenient than this policy has ever been. Since the public SDK came out in 2008, all forms of interpreted code have been forbidden. This is what dashed early plans to bring Java to the iPhone as an application runner, despite its absence as an applet runner in Safari. As Matt Drance has pointed out, the new policy reflects the reality on the ground that interpreters (especially Lua) have been tolerated for some time in games. The new phrasing forbids downloading of executable content, but allows for cases where the interpreter and all scripts are included in the app bundle. This has never been allowed before, and is a big deal.

Now let me stretch the definition of “interpreter” a bit, to the point where it includes virtual machines. After all, the line between the two is hard to define: a “virtual machine” is a design philosophy, not a technical trait. A VM uses an interpreter (often a byte code interpreter rather than source, but not necessarily), and presumably has more state and exposes more library APIs. But languages and their interpreters are getting bigger – Ruby I/O is in the language rather than in a library (like C or Java), but that doesn’t make Ruby a VM, does it?

You might have surmised where I’m going with this: I don’t think the revised section 3.3.2 bans a hypothetical port of the Flash or Java VMs to iOS anymore, if they’re in a bundle with the .swf or .jar files that they will execute.

I could be wrong, particularly given Steve Jobs’ stated contempt for these sorts of intermediary platforms. But if a .swf-and-Flash-VM bundle were rejected today, it would be by fiat, and not by the letter of section 3.3.2.

Whether any of this matters depends on whether anyone has stand-alone Flash applications (such as AIR apps) or Java applications that have value outside of a browser context, and are worth bringing to a mobile platform.


I can’t say why AIR never seemed to live up to its billing, but the failings of Desktop Java can in part be blamed on massive neglect by Sun, exacerbated by internecine developer skirmishes. Swing, the over-arching Java UI toolkit, was plagued by problems of complexity and performance when it was introduced in the late 90’s, problems that were never addressed. It’s nigh impossible to identify any meaningful changes to the API following its inclusion in Java 1.2 in 1998. Meanwhile, the IBM-funded Eclipse foundation tied their SWT more tightly to native widgets, but it was no more successful than Swing, at least in terms of producing meaningful apps. Each standard powers one IDE, one music-stealing client, and precious little else.

So, aside from the debatability of section 3.3.2, and wounded egos in the Flash and Java camps, the biggest impediment to using a “code plus VM” porting approach may be the fact that there just isn’t much worth porting in the first place.

Speaking of Desktop Java, the Java Posse’s Joe Nuxoll comes incredibly close to saying something that everybody in that camp needs to hear. In the latest episode, at 18:10, he says “…gaining some control over the future of mobile Java which, it’s over, it’s Android, it’s done.” He later repeats this assertion that Android is already the only form of mobile Java that matters, and gets agreement from the rest of the group (though Tor Norbye, an Oracle employee, can’t comment on this discussion of the Oracle/Google lawsuit, and may disagree). And this does seem obvious: Android, coupled with the rise of the smartphone, has rendered Java ME irrelevant (to say nothing of JavaFX Mobile, which seems stillborn at this point).

But then at 20:40, Joe makes the big claim that gets missed: “Think of it [Android] as Desktop Java for the new desktop.” Implicit in this is the idea that tablets are going to eat the lunch of traditional desktops and laptops, and those tablets that aren’t iPads will likely be Android-based. That makes Android the desktop Java API of the future, because not only have Swing, SWT, and JavaFX failed, but the entire desktop model is likely threatened by mobile devices. There are already more important, quality, well-known Android apps after two years than the desktop Java APIs produced in over a decade. Joe implies, but does not say, what should be obvious: all the Java mobile and desktop APIs are dead, and any non-server Java work of any relevance in the future will be done in Android.

No wonder Oracle is suing for a piece of it.

Sayonara, spammers

Every week, I get a few e-mails from my WordPress installation telling me that new users have created accounts on this blog. The new accounts almost never post any comments, and often have user names that are obviously spammy.

These users haven’t posted any spam, but I’m not going to take the chance — I’m going to start deleting accounts that don’t have real names, valid metadata, or a history of posting anything. Apologies if I blow away your account, but most of the legitimate posters are using OpenID anyways.

Reminder: Voices That Matter iPhone Developer’s Conference

Just a reminder, for those of you who don’t scroll all the way down the right column, that the Voices That Matter iPhone Developer’s Conference is coming up in a little over a month, October 16 & 17, in Philadelphia.

Why this matters right now:

  • Early Bird pricing ends tomorrow (September 10). Combine it with the speakers’ discount code PHRSPKR and you’re in for $395. Given the quality of the speakers, that’s a heck of a deal.
  • iOS 4.1 just came out yesterday, meaning we can now talk about new-in-4.1 APIs publicly. Aside from Game Center, one of the biggest changes in the SDK is the addition of AVAssetReader and AVAssetWriter to AV Foundation. These classes permit sample-level access to movies assets, enabling some new kinds of applications that weren’t possible before (can you say “ScreenFlow for iOS”?), as well as simplifying things like my music library PCM converter. I’m doing the talk on AV Foundation, and you can count on these new classes being covered.

So there you have it. See you in Philly. I’m going to try to make sure my travel plans get me there in time to do dinner at Ted’s on Friday night. Join me for bison burgers and Coke Zero… nom.

The Rough Cut is Up

So for everyone who’s been pining for the Core Audio, I’m please to announce that the Rough Cut is now available on Safari Books Online. If you’re a member, you now have access to the first draft of the first six (of an anticipated 12) chapters.

The first three chapters are set-up, but they’re not idle chatfests. The first introduces the framework and its major conventions. Chapter 2 discusses digital audio processing and how it works. Then chapter 3 puts the two together, showing how Core Audio models and works with digital audio. All three of these introductory chapters have example projects… it’s a very code-heavy book. Heck, you’re already generating raw PCM samples in chapter 2.

Part II has two chapters on Audio Queues: the first for recording, the second for playback. Chapter 6 gets into the Audio Converter and ExtAudioFile frameworks, for converting between encoded formats and PCM. There’s surprisingly little information out there on using the Audio Converter directly (and not that much on ExtAudioFile), so we’re breaking a little new ground here.

That’s where the current Rough Cut ends. Right now, I’m working on one hell of a chapter about Audio Units, the heart and soul of Core Audio. We have four examples in this chapter: a file player, a speech synthesizer, a sine-wave generator (which introduces render callbacks), and a play-through example (which covers input callbacks, audio devices, the CARingBuffer, and mixing). This chapter might end up being as long as all of Part II, which in turn might argue for a chapter split if we can find a good place to do it. Nevertheless, do not fear for depth in this book: this chapter in particular is going crazy deep. I hope to get it off to Chuck this week.

After that, it’s on to OpenAL, which I worked on for an unpublished ADC article, which in turn led to an earlier brain dump. My plan here is to cover both plain ol’ single-shot ALBuffers and streaming to an ALSource.

There’s still a ways to go, but I think getting past Audio Units will give me a break from “hard parts”, at least until the last chapter when we get into custom units.

And yes, the estimated date has slipped again. Clearly, if I’m on chapter 7 of 12, printed copies will not be ready by Christmas. Sorry about that. Please bear with us, as we try to get this one right. It’s too important a topic to do a slapdash rip-off-the-ADC-docs kind of job.

A Big Bet on HTTP Live Streaming

So, Apple announced yesterday that they’ll stream today’s special event live, and everyone immediately assumed the load would crash the stream, if not the whole internet, myself included. But then I got thinking: they wouldn’t even try it if they weren’t pretty damn sure it would work. So what makes them think this will work?

HTTP Live Streaming, that’s why. I banged out a series of tweets (1, 2, 3, 4, 5, 6, 7, 8, 9) spelling out why the nature of HTTP Live Streaming (which I worked with briefly on a fix-up job last year) makes it highly plausible for such a use.

To summarize the spec: a client retrieves a playlist (an .m3u8, which is basically a UTF-8’ed version of the old WinAmp playlist format) that lists segments of the stream as flat files (often .m4a’s for audio, and .ts for video, which is an MPEG-2 transport stream, though Apple’s payload is presumably H.264/AAC). The client downloads these flat files and sends them to its local media player, and refreshes the playlist periodically to see if there are new files to fetch. The sizing and timing is configurable, but I think the defaults are like a 60-second refresh cycle on the playlist, and segments of about 10 seconds each.

This can scale for a live broadcast by using edge servers, which Apple has long depended on Akamai (and others?) for. Apple vends you a playlist URL at a local edge server, and its contents are all on the edge server, so the millions of viewers don’t pound Apple with requests — the load is pushed out to the edge of the internet, and largely stays off the backbone. Also, all the local clients will be asking for the same handful of segment files at the same time, so these could be in in-memory caches on the edge servers (since they’re only 10 seconds of video each). All these are good things.

I do wonder if local 3G cells will be a point of failure, if the bandwidth on a cell gets saturated by iPhone clients receiving the files. But for wired internet and wifi LANs, I suspect this is highly viable.

One interesting point brought up by TUAW is the dearth of clients that can handle HTTP Live Streaming. So far, it’s iOS devices, and Macs with QuickTime X (i.e., running Snow Leopard). The windows version of QuickTime doesn’t support HTTP Live Streaming (being based on the “old” 32-bit QuickTime on Mac, it may effectively be in maintenance mode). Open standard or not, there are no handy HTTP Live Streaming clients for other OS’s, though MacRumors’ VNC-based workaround (which requires you to manually download the .m3u8 playlist and do the refresh yourself), suggests it would be pretty easy to get it running elsewhere, since you already have the ability to play a playlist of segments and just need to automate the playlist refresh.

Dan Leehr tweeted back that Apple has talked a good game on HTTP Live Streaming, but hasn’t really showed much. Maybe this event is meant to change that. Moreover, you can’t complain about the adoption — last December, the App Store terms added a new fiat that any streaming video app must use HTTP Live Streaming (although a February post seems to ratchet this back to apps that stream for more than 10 minutes over the cellular network), so any app you see with a video streaming feature almost certainly uses HLS. At WWDC, Apple boasted about the MLB app using HLS, and it’s a safe bet that most/all other iOS video streaming apps (Netflix, Crunchyroll, etc.) use it too.

And one more thing to think about… MLB and Netflix aren’t going to stream without DRM, right? That’s the other piece that nobody ever talks about with HTTP Live Streaming: the protocol allows for encrypting of the media files. See section 5 of the spec. As much as Apple and its fanboys talk up HTML5 as a rival to and replacement for Flash, this is the thing that should really worry Adobe: commoditizing DRM’ed video streaming.