Rss

Archives for : qtkit

AV Foundation and the void

Yesterday I streamed some WWDC sessions while driving to meet with a client. At a stop, I posted this pissy little tweet:

It got enough quizzical replies (and a couple favorites), I figured I should elaborate as best I can, while staying away from all things NDA.

Part of what I’m reacting to comes from a habit of mine of deliberately seeking the unseen, which I picked up either from Musashi’s Book of Five Rings, or Bastiat’s essay Ce qu’on voit et ce qu’on ne voit pas (“What is Seen and What is Unseen”), because of course with me it’s going to either be samurai or economics, right? Anyways, the idea is to seek truth not in what you encounter, but what is obvious by its absence. It’s something I try to do when editing: don’t focus only on what’s there in the document, also figure out if anything should be there, and isn’t.

And when I look at AV Foundation on iOS and especially on OS X, I feel like there are a lot of things missing.

Continue Reading >>

Connecting the Dots

Philip Hodgetts e-mailed me yesterday, having found my recent CocoaHeads Ann Arbor talk on AV Foundation, and searching from there to find my blog. The first thing this brings up is that I’ve been slack about linking my various online identities and outlets… it should be easier for anyone who happens across my stuff to be able to get to it more easily. As a first step, behold the “More of This Stuff” box at the right, which links to my slideshare.net presentations and my Twitter feed. The former is updated less frequently than the latter, but also contains fewer obscenities and references to anime.

Philip co-hosts a podcast about digital media production, and their latest episode is chock-ful of important stuff about QuickTime and QTKit that more people should know (frame rate doesn’t have to be constant!), along with wondering aloud about where the hell Final Cut stands given the QuickTime/QTKit schism on the Mac and the degree to which it is built atop the 32-bit legacy QuickTime API. FWIW, between reported layoffs on the Final Cut team and their key programmers working on iMovie for iPhone, I do not have a particularly good feeling about the future of FCP/FCE.

Philip, being a Mac guy and not an iOS guy, blogged that he was surprised my presentation wasn’t an NDA violation. Actually, AV Foundation has been around since 2.2, but only became a document-based audio/video editing framework in iOS 4. The only thing that’s NDA is what’s in iOS 4.1 (good stuff, BTW… hope we see it Wednesday, even though I might have to race out some code and a blog entry to revise this beastly entry).

He’s right in the podcast, though, that iPhone OS / iOS has sometimes kept some of its video functionality away from third-party developers. For example, Safari could embed a video, but through iPhone OS 3.1, the only video playback option was the MPMoviePlayerController, which takes over the entire screen when you play the movie. 3.2 provided the ability to get a separate view… but recall that 3.2 was iPad-only, and the iPad form factor clearly demands the ability to embed video in a view. In iOS 4, it may make more sense to ditch MPMoviePlayerController and leave MediaPlayer.framework for iPod library access, and instead do playback by getting an AVURLAsset and feeding it to an AVPlayer.

One slide Philip calls attention to in his blog is where I compare the class and method counts of AV Foundation, android.media, QTKit, and QuickTime for Java. A few notes on how I spoke to this slide when I gave my presentation:

  • First, notice that AV Foundation is already larger than QTKit. But also notice that while it has twice as many classes, it only has about 30% more methods. This is because AV Foundation had the option of starting fresh, rather than wrapping the old QuickTime API, and thus could opt for a more hierarchical class structure. AVAssets represent anything playable, while AVCompositions are movies that are being created and edited in-process. Many of the subclasses also split out separate classes for their mutable versions. By comparison, QTKit’s QTMovie class has over 100 methods; it just has to be all things to all people.

  • Not only is android.media smaller than AV Foundation, it also represents the alpha and omega of media on that platform, so while it’s mostly provided as a media player and capture API, it also includes everything else media-related on the platform, like ringtone synthesis and face recognition. While iOS doesn’t do these, keep in mind that on iOS, there are totally different frameworks for media library access (MediaPlayer.framework), low-level audio (Core Audio), photo library access (AssetsLibrary.framework), in-memory audio clips (System Sounds), etc. By this analysis, media support on iOS is many times more comprehensive than what’s currently available in Android.

  • Don’t read too much into my inclusion of QuickTime for Java. It was deprecated at WWDC 2008, after all. I put it in this chart because its use of classes and methods offered an apples-to-apples comparison with the other frameworks. Really, it’s there as a proxy for the old C-based QuickTime API. If you counted the number of functions in QuickTime, I’m sure you’d easily top 10,000. After all, QTJ represented Apple’s last attempt to wrap all of QuickTime with an OO layer. In QTKit, there’s no such ambition to be comprehensive. Instead, QTKit feels like a calculated attempt to include the stuff that the most developers will need. This allows Apple to quietly abandon unneeded legacies like Wired Sprites and QuickTime VR. But quite a few babies are being thrown out with the bathwater — neither QTKit nor AV Foundation currently has equivalents for the “get next interesting time” functions (which could find edit points or individual samples), or the ability to read/write individual samples with GetMediaSample() / AddMediaSample().

One other point of interest is one of the last slides, which quotes a macro seen throughout AVFoundation and Core Media in iOS 4:


__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);

Does this mean that AV Foundation will appear on Mac OS X 10.7 (or hell, does it mean that 10.7 work is underway)? IMHO, not enough to speculate, other than to say that someone was careful to leave the door open.

Update: Speaking of speaking on AV Foundation, I should mention again that I’m going to be doing a much more intense and detailed Introduction to AV Foundation at the Voices That Matter: iPhone Developer Conference in Philadelphia, October 16-17. $100 off with discount code PHRSPKR.

From iPhone Media Library to PCM Samples in Dozens of Confounding, Potentially Lossy Steps

iPhone SDK 3.0 provided limited access to the iPod Music Library on the device, allowing third party apps to search for songs (and podcasts and audiobooks, but not video), inspect the metadata, and play items, either independently or in concert with the built-in media player application. But it didn’t provide any form of write-access — you couldn’t add items or playlists, or alter metadata, from a third-party app. And it didn’t allow for third-party apps to do anything with the songs except play them… you couldn’t access the files, convert them to another format, run any kind of analysis on the samples, and so on.

So a lot of us were surprised by the WWDC keynote when iMovie for iPhone 4 was shown importing a song from the iPod library for use in a user-made video. We were even more surprised by the subsequent claim that everything in iMovie for iPhone 4 was possible with public APIs. Frankly, I was ready to call bullshit on it because of the iPod Library issue, but was intrigued by the possibility that maybe you could get at the iPod songs in iOS 4. A tweet from @ibeatmaker confirmed that it was possible, and after some clarification, I found what I needed.

About this time, a thread started on coreaudio-api about whether Core Audio could access iPod songs, so that’s what I set out to prove one way or another. So, my goal was to determine whether or not you could get raw PCM samples from songs in the device’s music library.

The quick answer is: yes. The interesting answer is: it’s a bitch, using three different frameworks, coding idioms that are all over the map, a lot of file-copying and possibly some expensive conversions.

It’s Just One Property; It Can’t Be That Hard

The big secret of how to get to the Music Library isn’t much of a secret. As you might expect, it’s in the MediaLibrary.framework that you use to interact with the library. Each song/podcast/audiobook is a MPMediaItem, and has a number of interesting properties, most of which are user-managed metadata. In iOS 4, there’s a sparkling new addition to the the list of “General Media Item Property Keys”: MPMediaItemPropertyAssetURL. Here’s the docs:

A URL pointing to the media item, from which an AVAsset object (or other URL-based AV Foundation object) can be created, with any options as desired. Value is an NSURL object.

The URL has the custom scheme of ipod-library. For example, a URL might look like this:

ipod-library://item/item.m4a?id=12345

OK, so we’re off and running. All we need to do is to pick an MPMediaItem, get this property as an NSURL, and we win.

Or not. There’s an important caveat:

Usage of the URL outside of the AV Foundation framework is not supported.

OK, so that’s probably going to suck. But let’s get started anyways. I wrote a throwaway app to experiment with all this stuff, adding to it piece by piece as stuff started working. I’m posting it here for anyone who wants to reuse my code… all my classes are marked as public domain, so copy-and-paste as you see fit.

MediaLibraryExportThrowaway1.zip

Note that this code must be run on an iOS 4 device and cannot be run in the Simulator, which doesn’t support the Media Library APIs.

The app just starts with a “Choose Song” button. When you tap it, it brings up an MPMediaPickerController as a modal view to make you choose a song. When you do so, the -mediaPicker:didPickMediaItems: delegate method gets called. At this point, you could get the first MPMediaItem and get its MPMediaItemPropertyAssetURL media item property. I’d hoped that I could just call this directly from Core Audio, so I wrote a function to test if a URL can be opened by CA:



BOOL coreAudioCanOpenURL (NSURL* url) {
	OSStatus openErr = noErr;
	AudioFileID audioFile = NULL;
	openErr = AudioFileOpenURL((CFURLRef) url,
		 kAudioFileReadPermission ,
		 0,
		 &audioFile);
	if (audioFile) {
		AudioFileClose (audioFile);
	}
	return openErr ? NO : YES;
}

Getting a NO back from this function more or less confirms the caveat from the docs: the URL is only for use with the AV Foundation framework.

AV for Vendetta

OK, so plan B: we open it with AV Foundation and see what that gives us.

AV Foundation — setting aside the simple player and recorder classes from 3.0 — is a strange and ferocious beast of a framework. It borrows from QuickTime and QTKit (the capture classes have an almost one-to-one correspondence with their QTKit equivalents), but builds on some new metaphors and concepts that will take the community a while to digest. For editing, it has a concept of a composition, which is made up of tracks, which you can create from assets. This is somewhat analogous to QuickTime’s model that “movies have tracks, which have media”, except that AVFoundation’s compositions are themselves assets. Actually, reading too much QuickTime into AV Foundation is a good way to get in trouble and get disappointed; QuickTime’s most useful functions, like AddMediaSample() and GetMediaNextInterestingTime() are antithetical to AV Foundation’s restrictive design (more on that in a later blog) and therefore don’t exist.

Back to the task at hand. The only thing we can do with the media library URL is to open it in AVFoundation and hope we can do something interesting with it. The way to do this is with an AVURLAsset.


NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];

If this were QuickTime, we’d have an object that we could inspect the samples of. But in AV Foundation, the only sample-level access afforded is a capture-time opportunity to get called back with video frames. There’s apparently no way to get to video frames in a file-based asset (except for a thumbnail-generating method that operates on one-second granularity), and no means of directly accessing audio samples at all.

What we can do is to export this URL to a file in our app’s documents directory, hopefully in a format that Core Audio can open. AV Foundation’s AVAssetExportSession has a class method exportPresetsCompatibleWithAsset: that reveals what kinds of formats we can export to. Since we’re going to burn the time and CPU of doing an export, it would be nice to be able to convert the compressed song into PCM in some kind of useful container like a .caf, or at least an .aif. But here’s what we actually get as options:

compatible presets for songAsset: (
 AVAssetExportPresetLowQuality,
 AVAssetExportPresetHighestQuality,
 AVAssetExportPreset640x480,
 AVAssetExportPresetMediumQuality,
 AVAssetExportPresetAppleM4A
 )

So, no… there’s no “output to CAF”. In fact, we can’t even use AVAssetExportPresetPassthrough to preserve the encoding from the music library: we either have to convert to an AAC (in an .m4a container), or to a QuickTime movie (represented by all the presets ending in “Quality”, as well as the “640×480”).

This Deal is Getting Worse All the Time!

So, we have to export to AAC. That’s not entirely bad, since Core Audio should be able to read AAC in an .m4a container just fine. But it sucks in that it will be a lossy conversion from the source, which could be MP3, Apple Lossless, or some other encoding.

In my GUI, an “export” button appears when you pick a song, and the export is kicked off in the event-handler handleExportTapped. Here’s the UI in mid-export:

MediaLibraryExportThrowaway1 UI in mid-export

To do the export, we create an AVExportSession and provide it with an outputFileType and outputIURL.


AVAssetExportSession *exporter = [[AVAssetExportSession alloc]
		initWithAsset: songAsset
		presetName: AVAssetExportPresetAppleM4A];
NSLog (@"created exporter. supportedFileTypes: %@", exporter.supportedFileTypes);
exporter.outputFileType = @"com.apple.m4a-audio";
NSString *exportFile = [myDocumentsDirectory()
		stringByAppendingPathComponent: @"exported.m4a"];
myDeleteFile(exportFile);
[exportURL release];
exportURL = [[NSURL fileURLWithPath:exportFile] retain];
exporter.outputURL = exportURL;	

A few notes here. The docs say that if you set the outputURL without setting outputFileType that the exporter will make a guess based on the file extension. In my experience, the exporter prefers to just throw an exception and die, so set the damn type already. You can get a list of possible values from the class method exporter.supportedFileTypes. The only supported value for the AAC export is com.apple.m4a-audio. Also note the call to a myDeleteFile() function; the export will fail if the target file already exists.

Aside: I did experiment with exporting as a QuickTime movie rather than an .m4a; the code is in the download, commented out. Practical upshot is that it sucks: if your song isn’t AAC, then it gets converted to mono AAC at 44.1 KHz. It’s also worth noting that AV Foundation doesn’t give you any means of setting export parameters (bit depths, sample rates, etc.) other than using the presets. If you’re used to the power of frameworks like Core Audio or the old QuickTime, this is a bitter, bitter pill to swallow.

Block Head

The code gets really interesting when you kick off the export. You would probably expect the export, a long-lasting operation, to be nice and asynchronous. And it is. You might also expect to register a delegate to get asynchronous callbacks as the export progresses. Not so fast, Bucky. As a new framework, AV Foundation adopts Apple’s latest technologies, and that includes blocks. When you export, you provide a completion handler, a block whose no-arg function is called when necessary by the exporter.

Here’s what mine looks like.


// do the export
[exporter exportAsynchronouslyWithCompletionHandler:^{
	int exportStatus = exporter.status;
	switch (exportStatus) {
		case AVAssetExportSessionStatusFailed: {
			// log error to text view
			NSError *exportError = exporter.error;
			NSLog (@"AVAssetExportSessionStatusFailed: %@",
				exportError);
			errorView.text = exportError ?
				[exportError description] : @"Unknown failure";
			errorView.hidden = NO;
			break;
		}
		case AVAssetExportSessionStatusCompleted: {
			NSLog (@"AVAssetExportSessionStatusCompleted");
			fileNameLabel.text =
				[exporter.outputURL lastPathComponent];
			// set up AVPlayer
			[self setUpAVPlayerForURL: exporter.outputURL];
			[self enablePCMConversionIfCoreAudioCanOpenURL:
				exporter.outputURL];
			break;
		}
		case AVAssetExportSessionStatusUnknown: {
			NSLog (@"AVAssetExportSessionStatusUnknown"); break;}
		case AVAssetExportSessionStatusExporting: {
			NSLog (@"AVAssetExportSessionStatusExporting"); break;}
		case AVAssetExportSessionStatusCancelled: {
			NSLog (@"AVAssetExportSessionStatusCancelled"); break;}
		case AVAssetExportSessionStatusWaiting: {
			NSLog (@"AVAssetExportSessionStatusWaiting"); break;}
		default: { NSLog (@"didn't get export status"); break;}
	}
}];

This kicks off the export, passing in a block with code to handle all the possible callbacks. The completion handler function doesn’t have to take any arguments (nor do we have to set up a “user info” object for the exporter to pass to the function), since the block allows anything in the local scope to be called from the block. That means the exporter and its state don’t need to be passed in as parameters, because the exporter is a local variable that can be accessed from the block and its state inspected via method calls.

The two messages I handle in my block are AVAssetExportSessionStatusFailed, which dumps the error to a previously-invisible text view, and AVAssetExportSessionStatusCompleted, which sets up an AVPlayer to play the exported audio, which we’ll get to later.

After starting the export, my code runs an NSTimer to fill a UIProgressView. Since the exporter has a progress property that returns a float, it’s pretty straightforward… check the code if you haven’t already done this a bunch of times. Files that were already AAC export almost immediately, while MP3s and Apple Lossless (ALAC) took a minute or more to export. Files in the old .m4p format, from back when the iTunes Store put DRM on all the songs, fail with an error, as seen below.

The Invasion of Time

Kind of as a lark, I added a little GUI to let you play the exported file. AVPlayer was the obvious choice for this, since it should be able to play whatever kind of file you export (.m4a, .mov, whatever).

This brings up the whole issue of how to deal with the representation of time in AV Foundation, which turns out to be great for everyone who ever used the old C QuickTime API (or possibly QuickTime for Java), and all kinds of hell for everyone else.

AV Foundation uses Core Media’s CMTime struct for representing time. In turn, CMTime uses QuickTime’s brilliant but tricky concept of time scales. The idea, in a nutshell, is that your units of measurement for any particular piece of media are variable: pick one that suits the media’s own timing needs. For example, CD audio is 44.1 KHz, so it makes sense to measure time in 1/44100 second intervals. In a CMTime, you’d set the timescale to 44100, and then a given value would represent some number of these units: a single sample would have a value of 1 and would represent 1/44100 of a second, exactly as desired.

I find it’s easier to think of Core Media (and QuickTime) timescales as representing “nths of a second”. One of the clever things you can do is to choose a timescale that suits a lot of different kinds of media. In QuickTime, the default timescale is 600, as this is a common multiple of many important frame-rates: 24 fps for film, 25 fps for PAL (European) TV, 30 fps for NTSC (North America and Japan) TV, etc. Any number of frames in these systems can be evenly and exactly represented with a combination of value and timescale.

Where it gets tricky is when you need to work with values measured in different timescales. This comes up in AV Foundation, as your player may use a different timescale than the items it’s playing. It’s pretty easy to write out the current time label:


CMTime currentTime = player.currentTime;
UInt64 currentTimeSec = currentTime.value / currentTime.timescale;
UInt32 minutes = currentTimeSec / 60;
UInt32 seconds = currentTimeSec % 60;
playbackTimeLabel.text = [NSString stringWithFormat:
		@"%02d:%02d", minutes, seconds];

But it’s hard to update the slider position, since the AVPlayer and the AVPlayerItem it’s playing can (and do) use different time scales. Enjoy the math.


if (player && !userIsScrubbing) {
	CMTime endTime = CMTimeConvertScale (player.currentItem.asset.duration,
		currentTime.timescale,
		kCMTimeRoundingMethod_RoundHalfAwayFromZero);
	if (endTime.value != 0) {
		double slideTime = (double) currentTime.value /
				(double) endTime.value;
		playbackSlider.value = slideTime;
	}
}

Basically, the key here is that I need to get the duration of the item being played, but to express that in the time scale of the player, so I can do math on them. That gets done with the CMTimeConvertScale() call. Looks simple here, but if you don’t know that you might need to do a timescale-conversion, your math will be screwy for all sorts of reasons that do not make sense.

Oh, you can drag the slider too, which means doing the same math in reverse.


-(IBAction) handleSliderValueChanged {
	CMTime seekTime = player.currentItem.asset.duration;
	seekTime.value = seekTime.value * playbackSlider.value;
	seekTime = CMTimeConvertScale (seekTime, player.currentTime.timescale,
			kCMTimeRoundingMethod_RoundHalfAwayFromZero);
	[player seekToTime:seekTime];
}

One other fun thing about all this that I just remembered from looking through my code. The time label and slider updates are called from an NSTimer. I set up the AVPlayer in the completion handler block that’s called by the exporter. This call seems not to be on the main thread, as my update timer didn’t work until I forced its creation over to the main thread with performSelectorOnMainThread:withObject:waitUntilDone:. Good times.

Final Steps

Granted, all this AVPlayer stuff is a distraction. The original goal was to get from iPod Music Library to decompressed PCM samples. We used an AVAssetExportSession to produce an .m4a file in our app’s Documents directory, something that Core Audio should be able to open. The remaining conversion is a straightforward use of CA’s Extended Audio File Services: we open an ExtAudioFileRef on the input .m4a, set a “client format” property representing the PCM format we want it to convert to, read data into a buffer, and write that data back out to a plain AudioFileID. It’s C, so the code is long, but hopefully not too hard on the eyes:


-(IBAction) handleConvertToPCMTapped {
	NSLog (@"handleConvertToPCMTapped");
	
	// open an ExtAudioFile
	NSLog (@"opening %@", exportURL);
	ExtAudioFileRef inputFile;
	CheckResult (ExtAudioFileOpenURL((CFURLRef)exportURL, &inputFile),
				 "ExtAudioFileOpenURL failed");
	
	// prepare to convert to a plain ol' PCM format
	AudioStreamBasicDescription myPCMFormat;
	myPCMFormat.mSampleRate = 44100; // todo: or use source rate?
	myPCMFormat.mFormatID = kAudioFormatLinearPCM ;
	myPCMFormat.mFormatFlags =  kAudioFormatFlagsCanonical;	
	myPCMFormat.mChannelsPerFrame = 2;
	myPCMFormat.mFramesPerPacket = 1;
	myPCMFormat.mBitsPerChannel = 16;
	myPCMFormat.mBytesPerPacket = 4;
	myPCMFormat.mBytesPerFrame = 4;
	
	CheckResult (ExtAudioFileSetProperty(inputFile,
			kExtAudioFileProperty_ClientDataFormat,
			sizeof (myPCMFormat), &myPCMFormat),
		  "ExtAudioFileSetProperty failed");

	// allocate a big buffer. size can be arbitrary for ExtAudioFile.
	// you have 64 KB to spare, right?
	UInt32 outputBufferSize = 0x10000;
	void* ioBuf = malloc (outputBufferSize);
	UInt32 sizePerPacket = myPCMFormat.mBytesPerPacket;	
	UInt32 packetsPerBuffer = outputBufferSize / sizePerPacket;
	
	// set up output file
	NSString *outputPath = [myDocumentsDirectory() 
			stringByAppendingPathComponent:@"export-pcm.caf"];
	NSURL *outputURL = [NSURL fileURLWithPath:outputPath];
	NSLog (@"creating output file %@", outputURL);
	AudioFileID outputFile;
	CheckResult(AudioFileCreateWithURL((CFURLRef)outputURL,
		   kAudioFileCAFType,
		   &myPCMFormat, 
		   kAudioFileFlags_EraseFile, 
		   &outputFile),
		  "AudioFileCreateWithURL failed");
	
	// start convertin'
	UInt32 outputFilePacketPosition = 0; //in bytes
	
	while (true) {
		// wrap the destination buffer in an AudioBufferList
		AudioBufferList convertedData;
		convertedData.mNumberBuffers = 1;
		convertedData.mBuffers[0].mNumberChannels = myPCMFormat.mChannelsPerFrame;
		convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
		convertedData.mBuffers[0].mData = ioBuf;

		UInt32 frameCount = packetsPerBuffer;

		// read from the extaudiofile
		CheckResult (ExtAudioFileRead(inputFile,
			  &frameCount,
			  &convertedData),
			 "Couldn't read from input file");
		
		if (frameCount == 0) {
			printf ("done reading from file");
			break;
		}
		
		// write the converted data to the output file
		CheckResult (AudioFileWritePackets(outputFile,
			   false,
			   frameCount,
			   NULL,
			   outputFilePacketPosition / myPCMFormat.mBytesPerPacket, 
			   &frameCount,
			   convertedData.mBuffers[0].mData),
			 "Couldn't write packets to file");
		
		NSLog (@"Converted %ld bytes", outputFilePacketPosition);

		// advance the output file write location
		outputFilePacketPosition +=
			(frameCount * myPCMFormat.mBytesPerPacket);
	}
	
	// clean up
	ExtAudioFileDispose(inputFile);
	AudioFileClose(outputFile);

	// GUI update omitted
}

Note that this uses a CheckResult() convenience function that Kevin Avila wrote for our upcoming Core Audio book… it just looks to see if the return value is noErr and tries to convert it to a readable four-char-code if it seems amenable. It’s in the example file too.

Is It Soup Yet?

Does all this work? Rather than inspecting the AudioStreamBasicDescription of the resulting file, let’s do something more concrete. With Xcode’s “Organizer”, you can access your app’s sandbox on the device. So we can just drag the Application Data to the Desktop.

In the resulting folder, open the Documents folder to find export-pcm.caf. Drag it to QuickTime Player to verify that you do, indeed, have PCM data:

So there you have it. In several hundred lines of code, we’re able to get a song from the iPod Music Library, export it into our app’s Documents directory, and convert it to PCM. With the raw samples, you could now draw an audio waveform view (something you’d think would be essential for video editors who want to match video to beats in the music, but Apple seems dead-set against letting us do do with AV Foundation or QTKit), you could perform analysis or effects on the audio, you could bring it into a Core Audio AUGraph and mix it with other sources… all sorts of possibilities open up.

Clearly, it could be a lot easier. It’s a ton of code, and two file exports (library to .m4a, and .m4a to .caf), when some apps might be perfectly happy to read from the source URL itself and never write to the filesystem… if only they could. Having spent the morning writing this blog, I may well spend the afternoon filing feature requests on bugreport.apple.com. I’ll update this blog with OpenRadar numbers for the following requests:

  • Allow Core Audio to open URLs provided by MediaLibrary’s MPMediaItemPropertyAssetURL
  • AV Foundation should allow passthrough export of Media Library items
  • AV Foundation export needs finer-grained control than just presets
  • Provide sample-level access for AVAsset

Still, while I’m bitching and whining, it is remarkable that iOS 4 opens up non-DRM’ed items in the iPod library for export. I never thought that would happen. Furthermore, the breadth and depth of the iOS media APIs remain astonishing. Sometimes terrifying, perhaps, but compared to the facile and trite media APIs that the other guys and girls get, we’re light-years ahead on iOS.

Have fun with this stuff!

Update: This got easier in iOS 4.1. Please forget everything you’ve read here and go read From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary instead.

Video Editing with Haddocks

News.com.com.com.com, on evidence of a new Apple video format in iMovie 8.0.5

Dubbed iFrame, the new video format is based on industry standard technologies like H.264 video and AAC audio. As expected with H.264, iFrame produces much smaller file sizes than traditional video formats, while maintaining its high-quality video. Of course, the smaller file size increases import speed and helps with editing video files.

Saying smaller files are easier to edit is like saying cutting down the mightiest tree in the forest is easier with a haddock than with a chainsaw, as the former is lighter to hold.

The real flaw with this is that H.264, while a lovely end-user distribution format, uses heavy temporal compression, potentially employing both P-frames (“predicted” frames, meaning they require data from multiple earlier frames), and B-frames (“bidirectionally predicted” frames, meaning they require data from both earlier and subsequent frames). Scrubbing frame-by-frame through H.264 is therefore slowed by sometimes having to read in and decompress multiple frames of data in order to render the next one. And in my Final Cut experience, scrubbing backwards through H.264 is particularly slow; shuttle a few frames backwards and you literally have to let go of the wheel for a few seconds to let the computer catch up. For editing, you see a huge difference when you use a format with only I-frames (“intra” frames, meaning every frame has all the data it needs), such as M-JPEG or Pixlet.

You can use H.264 in an all-I-frame mode (which makes it more or less M-JPEG), but then you’re not getting small file-sizes meant for end-user distribution. I’ll bet that iFrame employs H.264 P- and B-frames, being aimed at the non-pro user whose editing consists of just a handful of cuts, and won’t mind the disk grinding as they identify the frame to cut on.

But for more sophisticated editing, having your source in H.264 would be painful.

This also speaks to a larger point of Apple seemingly turning its back on advanced media creatives in favor of everyday users with simpler needs. I’ve been surprised at CocoaHeads meetings to hear that I’m not the only one who bemoans the massive loss of functionality from the old 32-bit C-based QuickTime API to the easier-to-use but severely limited QTKit. That said, everyone else expects that we’ll see non-trivial editing APIs in QTKit eventually. I hope they’re right, but everything I see from Apple, including iFrame’s apparent use of H.264 as a capture-time and therefore edit-time format, makes me think otherwise.

What’s New, Blue Q?

One-time self-described “World’s Greatest Compressionist” Ben Waggoner posts a pointed question to the quicktime-api list:

http://www.apple.com/macosx/what-is-macosx/quicktime.html

What I’d like to know is if QuickTime X is going to be available for Windows and older versions of Mac OS X.

It’s an important issue, because despite iTunes’ insistence on installing QuickTime on Windows, the future of that product seems completely unknown. For years, every question I’ve seen about the future of QuickTime on Windows has been met with absolute silence from Apple. Yeah, I know, “Apple does not comment on unannounced products,” and all… Still, Apple has left this technology in limbo for a remarkably long time. I recall asking ADC reps about QuickTime for Windows back at Leopard Tech Day Atlanta in 2006, as I was considering calling it from Java with JNI, and (as previously noted), I got no reply at all. And every other public question I’ve seen about the future of QuickTime on Windows has gone similarly unanswered, for years.

Smell that? That’s the scent of Abandoned Code Rot. We got that from QuickTime for Java for a few years before they managed to finally deprecate it (though they apparently haven’t gotten the message out).

It wouldn’t be too surprising to see QT for Windows fall by the wayside… Apple probably cares more about the popularity of its favorite formats and codecs (AAC and H.264) than of the QuickTime APIs and QuickTime’s interactive features like Wired Sprites that have been clearly and unequivocally beaten by Flash.

But if that’s true of Windows, is it also true on the Mac? QuickTime developers are right to be a little worried. The old C-based QuickTime API remains a 32-bit only option, intended to be replaced by the Objective-C QTKit. But in the four years since its introduction in Tiger, QTKit has only taken on part of the capabilities of the old QuickTime API. With Leopard, you could finally do capture and some significant editing (e.g., inserting segments at the movie or track levels), but raw sample level data was unavailable for any track type other than video, and some of the more interesting track types (like effects and especially tweens, useful for fading an audio track’s volume between specific times) are effectively useless in QTKit.

With Snow Leopard, the big news isn’t a more capable QTKit API, it’s QuickTime X. And as Apple’s QuickTime X page points out, QTX is all about a highly-optimized playback path (using decode hardware if available) and polished presentation. Great news if you’re playing 1080p movies on your computer or living room PC, not so much if you want to edit them: if you want to edit anything, you’re back in the old 32-bit QuickTime (and the code is probably still written in C against the old APIs, given QTKit’s numerous limitations). You don’t see a 64-bit Final Cut Pro, now do you? (BTW, here’s a nice blog on that topic.)

When you all install Snow Leopard tomorrow and run the QTX-based QuickTime Player, you’ll immediately understand why the $30 QuickTime Pro (which bought you editing and exporting from the Player app and the plug-in) is gone. Follow up in the comments tomorrow (after the NDA drops) and we’ll discuss further.

If I were starting a major new multimedia project that wasn’t solely playback-based — imagine, say, a podcast studio that would combine the editing, exporting, and publishing tasks that you might currently perform with Garage Band, iTunes, and FTP — I would be very confused as to which technology to adopt. QuickTime’s cross-platform story seems to be finished (QTJ deprecated, QTW rotting away), and everything we hear on the Mac side is about playback. Would it be safer to assume that QuickTime doesn’t have a future as a media creation framework, and drop down to the engine level (Core Audio and Core Video)? And if not QuickTime… then what?

Oh, and as for the first question from the quicktime-api thread:

… How about Apple throwing us a bone as to what QuickTime X will offer those of us that use QT and QTSS?

From what I can tell, Apple has all but ditched QTSS in favor of HTTP Live Streaming, supported by QuickTime X and iPhone 3.0.

An iPhone Core Audio brain dump

Twitter user blackbirdmobile just wondered aloud when the Core Audio stuff I’ve been writing about is going to come out. I have no idea, as the client has been commissioning a lot of work from a lot of iPhone/Mac writers I know, but has a lengthy review/rewrite process.

Right now, I’ve moved on to writing some beginner stuff for my next book, and will be switching from that to iPhone 3.0 material for the first book later today. And my next article is going to be on OpenAL. My next chance for some CA comes whenever I get time to work on some App Store stuff I’ve got planned.

So, while the material is still a little fresh, I’m going to post a stream-of-consciousness brain-dump of stuff that I learned along the way or found important to know in the course of working on this stuff.

  • It’s hard. Jens Alfke put it thusly:

    “Easy” and “CoreAudio” can’t be used in the same sentence. 😛 CoreAudio is very powerful, very complex, and under-documented. Be prepared for a steep learning curve, APIs with millions of tiny little pieces, and puzzling things out from sample code rather than reading high-level documentation.

  • That said, tweets like this one piss me off. Media is intrinsically hard, and the typical way to make it easy is to throw out functionality, until you’re left with a play method and not much else.

  • And if that’s all you want, please go use the HTML5 <video> and <audio> tags (hey, I do).

  • Media is hard because you’re dealing with issues of hardware I/O, real-time, threading, performance, and a pretty dense body of theory, all at the same time. Webapps are trite by comparison.

  • On the iPhone, Core Audio has three levels of opt-in for playback and recording, given your needs, listed here in increasing order of complexity/difficulty:

    1. AVAudioPlayer – File-based playback of DRM-free audio in Apple-supported codecs. Cocoa classes, called with Obj-C. iPhone 3.0 adds AVAudioRecorder (wasn’t sure if this was NDA, but it’s on the WWDC marketing page).
    2. Audio Queues – C-based API for buffered recording and playback of audio. Since you supply the samples, would work for a net radio player, and for your own formats and/or DRM/encryption schemes (decrypt in memory before handing off to the queue). Inherent latency due to the use of buffers.
    3. Audio Units – Low-level C-based API. Very low latency, as little as 29 milliseconds. Mixing, effects, near-direct access to input and output hardware.
  • Other important Core API’s not directly tied to playback and recording: Audio Session Services (for communicating your app’s audio needs to the system and defining interaction with things like background iPod player, ring/silent switch) as well as getting audio H/W metadata, Audio File Services for reading/writing files, Audio File Stream Services for dealing with audio data in a network stream, Audio Conversion Services for converting between PCM and compressed formats (and vice versa), Extended Audio File Services for combining file and conversion Services (e.g., given PCM, write out to a compressed AAC file).

  • You don’t get AVAudioPlayer or AVAudioRecorder on the Mac because you don’t need them: you already have QuickTime, and the QTKit API.
  • The Audio Queue Services Programming Guide is sufficient to get you started with Audio Queues, though it is unfortunate that its code excerpts are not pulled together into a complete, runnable Xcode project.

  • Lucky for you, I wrote one for the Streaming Audio chapter of the Prags’ iPhone book. Feel free to download the book’s example code. But do so quickly — the Streaming Audio chapter will probably go away in the 3.0 rewrite, as AVAudioRecorder obviates the need for most people to go down to the Audio Queue level. We may find some way to repurpose this content, but I’m not sure what form that will take. Also, I think there’s still a bug in the download where it can record with impunity, but can only play back once.

  • The Audio Unit Programming Guide is required reading for using Audio Units, though you have to filter out the stuff related to writing your own AUs with the C++ API and testing their Mac GUIs.

  • Get comfortable with pointers, the address-of operator (&), and maybe even malloc.

  • You are going to fill out a lot of AudioStreamBasicDescription structures. It drives some people a little batty.

  • Always clear out your ASBDs, like this:

    
    memset (&myASBD, 0, sizeof (myASBD))
    

    This zeros out any fields that you haven’t set, which is important if you send an incomplete ASBD to a queue, audio file, or other object to have it filled in.

  • Use the “canonical” format — 16-bit integer PCM — between your audio units. It works, and is far easier than trying to dick around bit-shifting 8.24 fixed point (the other canonical format).

  • Audio Units achieve most of their functionality through setting properties. To set up a software renderer to provide a unit with samples, you don’t call some sort of a setRenderer() method, you set the kAudioUnitProperty_SetRenderCallback property on the unit, providing a AURenderCallbackStruct struct as the property value.

  • Setting a property on an audio unit requires declaring the “scope” that the property applies to. Input scope is audio coming into the AU, output is going out of the unit, and global is for properties that affect the whole unit. So, if you set the stream format property on an AU’s input scope, you’re describing what you will supply to the AU.

  • Audio Units also have “elements”, which may be more usefully thought of as “buses” (at least if you’ve ever used pro audio equipment, or mixing software that borrows its terminology). Think of a mixer unit: it has multiple (perhaps infinitely many) input buses, and one output bus. A splitter unit does the opposite: it takes one input bus and splits it into multiple output buses.

  • Don’t confuse buses with channels (ie, mono, stereo, etc.). Your ASBD describes how many channels you’re working with, and you set the input or output ASBD for a given scope-and-bus pair with the stream description property.

  • Make the RemoteIO unit your friend. This is the AU that talks to both input and output hardware. Its use of buses is atypical and potentially confusing. Enjoy the ASCII art:

    
                             -------------------------
                             | i                   o |
    -- BUS 1 -- from mic --> | n    REMOTE I/O     u | -- BUS 1 -- to app -->
                             | p      AUDIO        t |
    -- BUS 0 -- from app --> | u       UNIT        p | -- BUS 0 -- to speaker -->
                             | t                   u |
                             |                     t |
                             -------------------------
    

    Ergo, the stream properties for this unit are

    Bus 0 Bus 1
    Input Scope: Set ASBD to indicate what you’re providing for play-out Get ASBD to inspect audio format being received from H/W
    Output Scope: Get ASBD to inspect audio format being sent to H/W Set ASBD to indicate what format you want your units to receive
  • That said, setting up the callbacks for providing samples to or getting them from a unit take global scope, as their purpose is implicit from the property names: kAudioOutputUnitProperty_SetInputCallback and kAudioUnitProperty_SetRenderCallback.

  • Michael Tyson wrote a vital blog on recording with RemoteIO that is required reading if you want to set callbacks directly on RemoteIO.

  • Apple’s aurioTouch example also shows off audio input, but is much harder to read because of its ambition (it shows an oscilliscope-type view of the sampled audio, and optionally performs FFT to find common frequencies), and because it is written with Objective-C++, mixing C, C++, and Objective-C idioms.

  • Don’t screw around in a render callback. I had correct code that didn’t work because it also had NSLogs, which were sufficiently expensive that I missed the real-time thread’s deadlines. When I commented out the NSLog, the audio started playing. If you don’t know what’s going on, set a breakpoint and use the debugger.

  • Apple has a convention of providing a “user data” or “client” object to callbacks. You set this object when you setup the callback, and its parameter type for the callback function is void*, which you’ll have to cast back to whatever type your user data object is. If you’re using Cocoa, you can just use a Cocoa object: in simple code, I’ll have a view controller set the user data object as self, then cast back to MyViewController* on the first line of the callback. That’s OK for audio queues, but the overhead of Obj-C message dispatch is fairly high, so with Audio Units, I’ve started using plain C structs.

  • Always set up your audio session stuff. For recording, you must use kAudioSessionCategory_PlayAndRecord and call AudioSessionSetActive(true) to get the mic turned on for you. You should probably also look at the properties to see if audio input is even available: it’s always available on the iPhone, never on the first-gen touch, and may or may not be on the second-gen touch.

  • If you are doing anything more sophisticated than connecting a single callback to RemoteIO, you may want to use an AUGraph to manage your unit connections, rather than setting up everything with properties.

  • When creating AUs directly, you set up a AudioComponentDescription and use the audio component manager to get the AUs. With an AUGraph, you hand the description to AUGraphAddNode to get back the pointer to an AUNode. You can get the Audio Unit wrapped by this node with AUGraphNodeInfo if you need to set some properties on it.

  • Get used to providing pointers as parameters and having them filled in by function calls:

    
    AudioUnit remoteIOUnit;
    setupErr = AUGraphNodeInfo(auGraph, remoteIONode, NULL, &remoteIOUnit);
    

    Notice how the return value is an error code, not the unit you’re looking for, which instead comes back in the fourth parameter. We send the address of the remoteIOUnit local variable, and the function populates it.

  • Also notice the convention for parameter names in Apple’s functions. inSomething is input to the function, outSomething is output, and ioSomething does both. The latter two take pointers, naturally.

  • In an AUGraph, you connect nodes with a simple one-line call:

    
    setupErr = AUGraphConnectNodeInput(auGraph, mixerNode, 0, remoteIONode, 0);
    

    This connects the output of the mixer node’s only bus (0) to the input of RemoteIO’s bus 0, which goes through RemoteIO and out to hardware.

  • AUGraphs make it really easy to work with the mic input: create a RemoteIO node and connect its bus 1 to some other node.

  • RemoteIO does not have a gain or volume property. The mixer unit has volume properties on all input buses and its output bus (0). Therefore, setting the mixer’s output volume property could be a de facto volume control, if it’s the last thing before RemoteIO. And it’s somewhat more appealing than manually multiplying all your samples by a volume factor.

  • The mixer unit adds amplitudes. So if you have two sources that can hit maximum amplitude, and you mix them, you’re definitely going to clip.

  • If you want to do both input and output, note that you can’t have two RemoteIO nodes in a graph. Once you’ve created one, just make multiple connections with it. The same node will be at the front and end of the graph in your mental model or on your diagram, but it’s OK, because the captured audio comes in on bus 1, and some point, you’ll connect that to a different bus (maybe as you pass through a mixer unit), eventually getting the audio to RemoteIO’s bus 0 input, which will go out to headphones or speakers on bus 0.

I didn’t come up with much (any?) of this myself. It’s all about good references. Here’s what you should add to your bookmarks (or Together, where I throw any Core Audio pages I find useful):

My emerging mental media taxonomy

Back when we did the iPhone discussion on Late Night Cocoa, I made a point of distinguishing the iPhone’s media frameworks, specifically Core Audio and friends (Audio Queue Services, Audio Session, etc.), from “document-based” media frameworks like QuickTime.

This reflects some thinking I’ve been doing over the last few months, and I don’t think I’m done, but it does reflect a significant change in how I see things and invalidates some of what I’ve written in the past.

Let me explain the breakdown. In the past, I saw a dichotomy between simple media playback frameworks, and those that could do more: mix, record, edit, etc. While there are lots of media frameworks that could enlighten me (I’m admittedly pretty ignorant of both Flash and the Windows’ media frameworks), I’m now organizing things into three general classes of media framework:

  • Playback-only – this is what a lot of people expect when they first envision a media framework: they’ve got some kind of audio or audio/video source and they just care about rendering to screen and speakers. As generally implemented, the source is generally opaque, so you don’t have to care about the contents of the “thing” you’re playing (AVI vs. MOV? MP3 vs. AAC? DKDC!), but you also can’t generally do anything with the source other than play it. Your control may be limited to play (perhaps at a variable rate), stop, jump to a time, etc.

  • Stream-based – In this kind of API, you see the media as a stream of data, meaning that you act on the media as it’s being processed or played. You generally get the ability to mix multiple streams, and add your own custom processing, with the caveat that you’re usually acting in realtime, so anything you do has to finish quickly for fear you’ll drop frames. It makes a lot of sense to think of audio this way, and this model fits two APIs I’ve done significant work with: Java Sound and Core Audio. Conceptually, video can be handled the same way: you can have a stream of A/V data that can be composited, effected, etc. Java Media Framework wanted to be this kind of API, but it didn’t really stick. I suspect there are other examples of this that work; the Slashdot story NVIDIA Releases New Video API For Linux describes a stream-based video API in much the same terms: ‘The Video Decode and Presentation API for Unix (VDPAU) provides a complete solution for decoding, post-processing, compositing, and displaying compressed or uncompressed video streams. These video streams may be combined (composited) with bitmap content, to implement OSDs and other application user interfaces.’.

  • Document-based – No surprise, in this case I’m thinking of QuickTime, though I strongly suspect that a Flash presentation uses the same model. In this model, you use a static representation of media streams and their relationships to one another: rather than mixing live at playback time, you put information about the mix into the media document (this audio stream is this loud and panned this far to the left, that video stream is transformed with this matrix and here’s its layer number in the Z-axis), and then a playback engine applies that mix at playback time. The fact that so few people have worked with such a thing recalls my example of people who try to do video overlays by trying to hack QuickTime’s render pipeline rather than just authoring a multi-layer movie like an end-user would.

I used to insist that Java needed a media API that supported the concept of “media in a stopped state”… clearly that spoke to my bias towards document-based frameworks, specifically QuickTime. Having reached this mental three-way split, I can see that a sufficiently capable stream-based media API would be powerful enough to be interesting. If you had to have a document-based API, you could write one that would then use the stream API as its playback/recording engine. Indeed, this is how things are on the iPhone for audio: the APIs offer deep opportunities for mixing audio streams and for recording, but doing something like audio editing would be a highly DIY option (you’d basically need to store edits, mix data, etc., and then perform that by calling the audio APIs to play the selected file segments, mixed as described, etc.).

But I don’t think it’s enough anymore to have a playback-only API, at least on the desktop, for the simple reason that HTML5 and the <video> tag commoditizes video playback. On JavaPosse #217, the guys were impressed by a blog claiming that a JavaFX media player had been written in just 15 lines. I submit that it should take zero lines to write a JavaFX media player: since JavaFX uses WebKit, and WebKit supports the HTML5 <video> tag (at least on Mac and Windows), then you should be able to score video playback by just putting a web view and an appropriate bit of HTML5 in your app.

One other thing that grates on me is the maxim that playback is what matters the most because that’s all that the majority of media apps are going to use. You sort of see this thinking in QTKit, the new Cocoa API for QuickTime, which currently offers very limited access to QuickTime movies as documents: you can cut/copy/paste, but you can’t access samples directly, insert modifier tracks like effects and tweens, etc.

Sure, 95% of media apps are only going to use playback, but most of them are trivial anyways. If we have 99 hobbyist imitations of WinAmp and YouTube for every one Final Cut, does that justify ignoring the editing APIs? Does it really help the platform to optimize the API for the trivialities? They can just embed WebKit, after all, so make sure that playback story is solid for WebKit views, and then please Apple, give us grownups segment-level editing already!

So, anyways, that’s the mental model I’m currently working with: playback, stream, document. I’ll try to make this distinction clear in future discussions of real and hypothetical media APIs. Thanks for indulging the big think, if anyone’s actually reading.

Death or glory, vol. 3

When we all got laid off from Worthless Piece of Crap Wireless Software Company #2, I vowed I’d never touch mobility again. When we crawled to the finish of Swing Hacks, I vowed I’d never write a book again.

So, when I say that I’m co-authoring a book on the iPhone, well, I guess I’ve got some ‘splaining to do.

The teeming dozens of you who read this blog — well, more than one dozen — have probably picked up on my interest in the iPhone OS platform and the Cocoa Touch framework. It’s one of the first new platforms we’ve seen in a while, and a highly appealing one at that, thoughtfully designed, with good documentation and tools provided, and with millions of users already out there, and tens of millions more waiting for the iPhone 3G. Apple’s App Store strategy stares straight at 10 years of carrier and device-maker restrictions on (or utter prohibition of) third-party applications and gives it a long-overdue middle finger, something that has developers swarming to the platform.

Let’s noodle with a little math. Let’s say you could sell a $3 app to 1/10 of one percent of iPhone owners. Your cut would be about $2 a copy. Assuming 20 million iPhones by the end of 2008 — the 6 million already in use, plus many more at the low price point and available in 70 countries instead of 6 — you’d multiply that $2 times 20,000 and get $40,000. Do that twice a year and it’s a decent day job. Adjust price points, popularity, frequency, and/or expectations, and there are many ways to make this work.

And that’s not even considering the likely consulting/contracting opportunities for enterprise iPhone app development. I’m available now, and my rates start at $80/hour. Contact cadamson@subfurther.com for a free initial consultation.

So, while I was teaching myself iPhone app development — I got the SDK the day it came out, and was working through the introductory docs on the plane home from the Java Posse Roundup — Daniel Steinberg of the Pragmatic Programmers contacted me about joining on with an iPhone book. The Prags’ approach is excellently-suited for the topic, as their build-your-own-PDF beta program and constantly-available updates are well suited to a topic as much in flux as the iPhone SDK (anyone who has tried out all the betas can attest to this… the default XCode template continues to be in play, even after 7 betas). Their philosophical approaches to book-writing and guidance to authors is also highly appropriate and based in real-world experiences… some of which would have made my earlier books work out better.

There are currently three authors on this as-yet untitled book. Marcel Molina, Jr. is a long-time Rails expert, and Bill Dudney is a “former Java guy” (I’m hearing that term a lot in iPhone circles) who just finished a nice book on Core Animation that I was fortunate enough to tech review (he has a chapter on QTKit integration that covers playback and capture… awesome!).

Right now, we’re cranking to have a substantive beta, and I’m enjoying the benefits of having started most of my study with the non-GUI elements of the iPhone, allowing me to dig deeply into media and networking (you may have seen hints of this work here, here, and here, among other places), and then come back to the GUI. Moreover, when I started writing my sample code, my example forced me to jump right in and do paged-based navigation, so instead of dicking around with little one-off “play with some trivial widgets” experiments, my first GUI apps ended up being big enough to give me a sense of how all the parts of UIKit relate to one another, so while it was a steep hill, stuff has come pretty easily after getting over it.

Oh, and if you’re wondering about my web radio client, ported from iPhone back to Mac, and the ADC support issue I filed on Audio Queue Services callbacks? I met the ADC engineer at WWDC. It turns out the ADC engineers all got pulled into the iPhone crunch, which has left a lot of ADC issues neglected. He did send me some code which may get me unblocked, but I haven’t had a chance to try it yet. Even if it works, I almost certainly won’t be ready for the App Store opening; I’ll probably table it until after the book’s beta. But still, it would be nice to get an app on the store before one of us has to write the chapter on how the reader can get his or her app on the store.

And that gets me to one of the really great things about the Prags’ approach. They’re strongly reader-oriented, with a concept of getting the reader to go on a journey of learning and accomplishment. When I was doing the QuickTime book, my editor encouraged me to abandon “we” and speak directly to the reader: “I” think this, “you” can do that. This was a good approach for developing a dialogue, but the Prags take it to the next level, and actually have a valid use for reintroducing “we”: it’s the things we do together, collaborating as author and reader to master this topic.

Anyways, the crunch is on. Right now, I’m handling file I/O, database, preferences, network I/O (Cocoa and CoreFoundation are practically two topics), followed by a bunch of media topics. My guess is that I’ll be in for 60-80 pages of the beta. And I’ve got about 30 now.

So the next couple weeks will be a little crunchy…

An arbitrary collection of hastily taken, poorly composed iPhone photos from WWDC 2008

What can I say about WWDC 2008? Well, not that much actually — all the sessions other than the keynote were covered by NDA, so beyond the publicly-available session schedules and descriptions, the content of the show remains off-limits. So, yeah, I learned about stuff in Snow Leopard, QuickTime X, and iPhone 2.0, but I’ll have to wait until Apple releases the stuff before I can comment on it.

Granted, the keynote isn’t NDA, since it was made available in its entirety. The blur below is Steve Jobs taking the stage in the beginning:
Steve Jobs takes the stage for WWDC 2008 keynote

Two things you wouldn’t know from watching the fixed-in-post stream or video podcast of the Stevenote:

  1. The parade of nations in which iPhone 3G will be sold was accompanied by the Disney theme park song “It’s A Small World”. Guess they didn’t want to deal with rights fees?
  2. The second time they played the ad, the sound guys forgot to bring up the ad audio. Fixed in post.

So, yeah, whatever, check out the Stevenote yourself if you must. Now onto my own impressions, starting with sessions:
WWDC 2008 Presidio room

That’s the “Presidio” room, the largest of the session rooms I attended. Actually, it’s about the front 1/3 to 1/2 of the keynote room, which was then subdivided into smaller sections for regular sessions the rest of the week. Presidio played host to what you’d call the “mainstream” iPhone sessions, stuff like intros and debugging and the various core libraries and such. The room probably holds about 2,000 people, so you can draw the conclusion from the foot traffic that WWDC was at least half iPhone developers, maybe more.

But I think that’s a little deceptive. I don’t think the Mac/iPhone distinction was particularly binary. It’s much more of a continuum of people with varying levels of interest in both platforms. Frankly, from what I could tell, there weren’t very many developers there who were completely new to Apple platforms. A lot of the iPhone developers were people who’d already been working on the Mac. Some of these are long-time Mac developers — notice how some of the Apple Design Awards for iPhone went to Mac stalwarts like Iconfactory, and how Pangea got in the keynote — others are people are those who’d been messing with Mac by night, and are interested in joining the Gold Rush (or Black Parade, if you will) on the iPhone. Actually, one thing that was uncanny was the number of people who I spoke with in line who were Java programmers or IT people by day, iPhone apprentices by night. I don’t know if this augurs a mass migration of developers away from Java, or if the kind of people who would go to WWDC would have such a typical “Mr. Anderson” day job. But having gone to JavaOne a month ago, the technical merits and appeal of the platforms are like night and day. And the Sun isn’t what’s shining.

Subjective comments about session content that avoids NDA violations:

  • Snow Leopard’s performance technologies have fascinating potential.
  • QuickTime X is a major project that was probably inevitable given the age of the QuickTime codebase (nearly 20 years old at this point). Great potential, but surely someone’s favorite features will get left behind.
  • As I blogged on O’Reilly’s iPhone site, iPhone webapps really surprised me with their capabilities. In our demands for a real SDK, we might have overlooked some coolness here.
  • Probably the biggest contrast with Android, which I investigated for a client, is that Android’s SDK is completely decoupled from any eventual implementation, while the iPhone SDK is generally well-aware of the devices on which it runs. The purists probably balk at tight coupling, but I wonder if the Android approach isn’t making a Big Design Up Front or Waterfall-style mistake: it’s promising functionality (and a quality of service) that yet-to-be-developed hardware has to back up, or at least gracefully degrade out of. The iPhone’s approach is more, well, agile: build a device, put an SDK around its features, rev the device, rev the SDK. Android will have to prove to me that it’s not going to end up as “write once, debug everywhere.”

While I’m making contrasts with Java, here’s another point on which WWDC stomps JavaOne:
WWDC 2008 session seating power strip

These power strips weren’t everywhere, but they were in enough sessions that if you got in early enough and sought them out, you could avoid needing a second battery. They had some serious power distribution bricks under the power-stripped rows, so this wasn’t some cheesy daisy-chain thing that could trip the breaker. It was done right.

I didn’t use the hang spaces much, but got a few funny shots:
WWDC 2008 hang space whiteboard job listing

Notice the note on the right: “and we have Rock Band and Wii”. At the Java Posse Roundup, I mentioned that having a communal video game system was the most effective team-building tool for a high-tech company that I knew of, particularly good at getting QA and engineering talking. As social as Soul Calibur and EA’s sports games are, I imagine the super-social Rock Band could bring this to a whole new level.

Another WWDC 2008 hang space whiteboard

Nothing super remarkable here (the Blizzard listing is interesting), except to note how packed it is with contacts and job listings. Now on the other side, let’s take a step away from WWDC for a minute and look across 4th Street to that temple of failure and delusion, the Metreon:

Closed down Metreon shops

At this point, the Metreon is more than half empty. The fifth floor’s “Where the Wild Things” attraction is long gone, as is the fourth’s “The Way Things Work” movie and exhibits. The high-end steak place on the second floor closed sometime between JavaOne and WWDC, joining the Games Workshop and the stores above, previously a Gundam store (left) and a comic shop (right). Across the hall from them we find:

Crane Game store at Metreon

…a store space with nothing but crane games. To make it more amusing, this space was originally occupied by a Microsoft retail store. Given the collapse of everything around it in Metreon, it’s hard to hang that one on Microsoft.

Back to WWDC, here’s The Company Store:

WWDC Company Store

It’s not the selection offered by the Company Store at Apple HQ, but they had a perfectly adequate set of Apple t-shirts and a few other items. Would have been nice if they hadn’t charged me three times for my purchase, but that’s for me, their vendor, and Mastercard to sort out tomorrow.
WWDC 2008 Evening Reception

One of the evening receptions… pretty typical for a Moscone event, actually… the food lines move way too slow for the number of attendees. On Tuesday, they had free drinks (fast) to go with the free food (slow). You can guess how that worked out.

OK, a few people pictures before I call it a night:
James Duncan Davidson @ WWDC 2008

This is James Duncan Davidson, who was photographing the Thursday night bash in Yerba Buena Gardens, along with partner Pinar, on an official basis. They got some shots from the fifth floor of Metreon, probably marking about the only time recently that floor’s been occupied.

Daniel Steinberg @ WWDC 2008 Party

And this is podcaster, Pragmatic Programmers editor, and original java.net editor Daniel Steinberg at the party, with the Barenaked Ladies performing behind him. The band was ideal for the event, Mac geeks who could crack wise about arcana from System 7.6 (or, for that matter, an inclusion of one of their videos on a System 8 CD, without their approval). Compare (again) to JavaOne, where Smashmouth’s contempt for corporate gigs was perfectly captured by the disinterested singer’s banter in which he literally said “we’d like to thank YOUR COMPANY for having us here tonight.” Still, kind of a bummer to think that I never got to WWDC in the era of the Campus Bash. My bad for not jumping into the Mac platform sooner.

Not dead, just too busy to blog

Yes, I actually am at WWDC. I’ve just been so busy doing conference stuff, keeping the java.net front page going, and blogging for O’Reilly’s Inside iPhone site, that I’ve neglected [Time code];

Sorry. I am shooting iPhone pictures for a big update later. For the time being, enjoy the irony of a screenshot from my QTKit capture app taken from inside the Graphics and Media Lab at WWDC:

More later, at least what I can blog without violating NDA.