Archives for : ipod

From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary

In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to .m4a, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.

Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.

iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes are AVAssetReader and AVAssetWriter. Using these, we can dramatically simplify and improve the iPod converter.

I have an example project, (70 KB) that was originally meant to be part of my session at the Voices That Matter iPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.

Screenshot of VTM_AViPodReader

To do the conversion, we’ll use an AVAssetReader to read from the original song file, and an AVAssetWriter to perform the conversion and write to a new file in our application’s Documents directory.

Start, as in the previous example, by using the valueForProperty:MPMediaItemPropertyAssetURL attribute to get an NSURL representing the song in a format compatible with AV Foundation.

-(IBAction) convertTapped: (id) sender {
	// set up an AVAssetReader to read from the iPod Library
	NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
	AVURLAsset *songAsset =
		[AVURLAsset URLAssetWithURL:assetURL options:nil];

	NSError *assetError = nil;
	AVAssetReader *assetReader =
		[[AVAssetReader assetReaderWithAsset:songAsset
	if (assetError) {
		NSLog (@"error: %@", assetError);

Sorry about the dangling retains. I’ll explain those in a little bit (and yes, you could use the alloc/init equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAsset and make an AVAssetReader from it.

But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput, which is able to produce samples from an AVAssetReader.

AVAssetReaderOutput *assetReaderOutput =
				audioSettings: nil]
if (! [assetReader canAddOutput: assetReaderOutput]) {
	NSLog (@"can't add reader output... die!");
[assetReader addOutput: assetReaderOutput];

AVAssetReaderOutput is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput instead. An important point here is that we set audioSettings to nil to get a generic PCM output. The alternative is to provide an NSDictionary specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.

That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf if it already exists.

NSArray *dirs = NSSearchPathForDirectoriesInDomains 
				(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
	[[NSFileManager defaultManager] removeItemAtPath:exportPath
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];

Yeah, there’s another spurious retain here. I’ll explain later. For now, let’s take exportURL and create the AVAssetWriter:

AVAssetWriter *assetWriter =
	[[AVAssetWriter assetWriterWithURL:exportURL
if (assetError) {
	NSLog (@"error: %@", assetError);

OK, no sweat there, but the AVAssetWriter isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.

To create the AVAssetWriterInput, we provide an NSDictionary describing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput. The dictionary keys are defined in AVAudioSettings.h and AVVideoSettings.h. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in a AudioStreamBasicDescription, along with an AudioChannelLayout structure, which needs to be wrapped in an NSData in order to be added to an NSDictionary

AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
	[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey, 
	[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
	[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
	[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
	[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
	[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,

With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create an AVAssetWriterInput to encode and write samples in this format.

AVAssetWriterInput *assetWriterInput =
	[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
if ([assetWriter canAddInput:assetWriterInput]) {
	[assetWriter addInput:assetWriterInput];
} else {
	NSLog (@"can't add asset writer input... die!");
assetWriterInput.expectsMediaDataInRealTime = NO;

Notice that we’ve set the property assetWriterInput.expectsMediaDataInRealTime to NO. This will allow our transcode to run as fast as possible; of course, you’d set this to YES if you were capturing or generating samples in real-time.

Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:

[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];

These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is the AVAssetReaderOutput method copyNextSampleBuffer. This call produces a Core Media CMSampleBufferRef, which is what we need to provide to the AVAssetWriterInput‘s appendSampleBuffer method.

But this is where it starts getting tricky. We can’t just drop into a while loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block to the asset writer’s requestMediaDataWhenReadyOnQueue:usingBlock. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retains… autoreleased variables created here in convertTapped: will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block modifier to get into the block.

__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
	dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue 
										usingBlock: ^ 

The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.

while (assetWriterInput.readyForMoreMediaData) {
	CMSampleBufferRef nextBuffer =
		[assetReaderOutput copyNextSampleBuffer];
	if (nextBuffer) {
		// append buffer
		[assetWriterInput appendSampleBuffer: nextBuffer];
		// update ui
		convertedByteCount +=
			CMSampleBufferGetTotalSampleSize (nextBuffer);
		NSNumber *convertedByteCountNumber =
			[NSNumber numberWithLong:convertedByteCount];
		[self performSelectorOnMainThread:@selector(updateSizeLabel:)

What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.

If the writer is ever unable to accept new samples, we fall out of the while and the block, though GCD will continue to re-run the block until we explicitly stop the writer.

How do we know when to do that? When we don’t get a sample from copyNextSampleBuffer, which means we’ve read all the data from the reader.

} else {
	// done!
	[assetWriterInput markAsFinished];
	[assetWriter finishWriting];
	[assetReader cancelReading];
	NSDictionary *outputFileAttributes =
		[[NSFileManager defaultManager]
	NSLog (@"done. file size is %ld",
		    [outputFileAttributes fileSize]);
	NSNumber *doneFileSize = [NSNumber numberWithLong:
			[outputFileAttributes fileSize]];
	[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
	// release a lot of stuff
	[assetReader release];
	[assetReaderOutput release];
	[assetWriter release];
	[assetWriterInput release];
	[exportPath release];

Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release all the objects we had to retain in order that they would be available to the block.

Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:

	NSLog (@"bottom of convertTapped:");

Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf file in your app’s Documents directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:

Accessing app's documents with Xcode Organizer

The exported folder will have a Documents, in which you should find exported.caf. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:

QuickTime Player inspector showing PCM format of exported.caf file

Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:

2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
 due to uncaught exception 'NSInvalidArgumentException', reason:
 '*** -[AVAssetReader initWithAsset:error:] invalid parameter not
 satisfying: asset != ((void *)0)'

Still, the arrival of AVAssetReader and AVAssetWriter open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).

Beauty and the Box

Yesterday, I took my 5-year-old daughter, Quinn, to the Beauty and the Beast sing-a-long event. Quick summary: would have worked better with more people (we only had about 20, and most were shy), but helped to be in front of some theatre girls who knew the songs by heart and were into it. Still, one of my favorite movies, one I’ve surely seen 20 or 30 times. But let’s get back to digital media…

The event was meant to promote Tuesday’s re-release of Beauty and the Beast on home video, this time in its first HD edition. I’ve already owned B&tB on VHS and DVD (the 2003 edition cleverly contained the “work in progress” film circuit version, the original version, and the IMAX re-release that added an unneeded song). So I found myself wondering if I would be buying this release. Probably not, since I don’t own a Blu-Ray player and now that we’re many years into the Blu-Ray era, I don’t see that changing anytime soon. We don’t do a lot of movie watching anymore, as most of what we watch is DVR’ed off the DirecTV, and I didn’t fall for the “PlayStation 3 as Blu-Ray trojan horse” due to the PS3’s absurd unaffordability. And I don’t feel like we’ve missed it.

Then I thought: “wait, Blu-Ray isn’t the only form of HD.” There’s also on-demand from DirecTV, and what about iTunes? A little search there shows that yes indeed, the B&tB platinum edition will also be available on iTunes: $14.99 for SD, $19.99 for HD.

Of course, these Disney classics are usually only available for a short time before they “go back in the vault”, to enhance demand for the next re-release. So if I felt I did need to grab an HD version before it went away, which would I get?

Thinking about it, I think I’m more likely to buy an AppleTV — or at least rig a Mac Mini to a TV — before I get a Blu-Ray player. As it is, I could play the HD .m4p on a bunch of the devices I currently own (computers, iPhone, iPad), and the only thing that’s missing is connectivity to the TV. In fact, various video out cables allow for iOS devices to serve as a sort of “poor man’s” first-gen AppleTV, depending on your available connections, how many videos you’ve loaded on your iPod, and your tolerance for SD. A Blu-Ray disc would be locked to the TV the player is connected to, and wouldn’t be rippable for the iDevices (though this particular bundle may come with a digital copy… haven’t checked).

Still, I’m surprised to find that I’ve blundered into exactly what Steve Jobs purportedly told a customer in one of those alleged off-the-cuff e-mails: Blu-Ray is coming up short, and will eventually be replaced by digital downloads, just as CD successors were beaten by downloads (anybody spun up an SACD lately?).

BTW, Apple’s resolute anti-Blu stance is made all the more interesting by the fact that Apple is a board member of the Blu-Ray Disc Association.

Another note about the AppleTV: teardowns and other spelunking reevel that the new device runs iOS and has 8GB of storage, which would be suitable for apps, should Apple ever choose to deliver a third-party SDK. Clearly the UI would be different — perhaps it exists not as an “AppKit” or “UIKit” but rather a “TVKit” atop Foundation and the rest of the usual Apple stack — but there would be all sorts of interesting opportunities.

One of the most obvious would be for all the existing iOS streaming media apps to connect to the TV. This includes the sports apps — everyone knows about the MLB app, but look further and you’ll find apps for major events like the PGA Championship and Ryder Cup also have their own apps with live video available via in-app purchase, DirecTV’s “NFL Sunday Ticket” streams to phones, etc. There are also specialized video apps for all manner of niches. For example, as an anime fan, I use Crunchyroll’s streaming app, and might someday sign up for Anime Network Mobile. I imagine every other little video fetish has its own streaming app, or soon will.

(By the way, none of these apps can use the standard def video out cables like Apple’s iPod or Videos apps can. When you connect the composite video cable and do [UIScreen screens], you only see one screen, so these streaming apps can’t access the video out and put their player UI over there. rdar://8063058 )

By Apple fiat, essentially all of these apps need to use HTTP Live Streaming, and an AppleTV that permitted third-party apps would presumably drive even more content providers to this standard. I had previously wondered aloud about doing an HTTP Live Streaming book, but if we get an AppleTV SDK, it would make perfect sense for the HLS material to become one or two chapters of a book on AppleTV programming, along the lines of “if you’re programming for this platform, you’re almost certainly going to be streaming video to it, so here’s how the client side works, and here’s how to set up your server.”

More things in Heaven and Earth

There are more things in Heaven and Earth, Horatio; than are dreamt of in your philosophy.
Hamlet, Act 1, Scene V

There’s suddenly a lot of conventional wisdom that says the rise and eventual dominance of Android is manifest, and inevitable. Some of these claims make dubious analogies to Windows’ defeat of the Mac in the 90’s, ramming square pegs through round holes to make the analogy stick (to wit: who are the hardware manufacturers this time, the handset makers or the carriers). It may indeed come to pass, but the reasoning behind these claims is pretty shallow thusfar.

Case in point: an Appcelerator survey covered in The Apple Blog story Devs Say Android is Future-Proof. iOS? Not So Much. The reasoning for Android’s perceived advantage? This article doesn’t mention Android’s license terms and widespread hardware adoption (maybe that’s taken for granted at this point?), and instead mentions only the appeal of writing apps for GoogleTV, a product that is not even out yet (meaning Adamson’s First Law applies), to say nothing of how many purported “interactive television revolutions” we’ve suffered through over the decades (Qube, videotex, WebTV, Tru2Way, etc.). Maybe it’ll be the next big thing, but history argues otherwise.

In the 90’s, the rise of Java seemed an obvious bet. Applets would make web pages far more compelling than static pages and lengthy form submits, and application developers would surely be better off with garbage collection and strong typing than with C and C++. Java was so sure to be big, that Microsoft threw the full force of its dirty tricks machine at it, while Apple exposed most of the Mac’s unique libraries to Java bindings (including, at various times, QuickTime, Cocoa, Core Audio, speech, and more). But it didn’t work out that way: Java on the browser was displaced by JavaScript/Ajax, and the early attempts to write major desktop applications in Java were unmitigated disasters, with the Netscape Navigator port abandoned, and Corel’s Java version of Word Perfect Office was buried almost immediately after it was released. 1996’s sure bet was a has-been (or a never-was) by 2001.

If you think about it, the same thing happened a few years ago with AIR. With the YouTube-powered rise of Flash, AIR seemed a perfect vehicle to bring hordes of Flash developers to the desktop. Everyone knew it would be big. Except it wasn’t. AIR applications are rare today, perhaps rarer even than Java. Admittedly, I only remembered of AIR’s existence because I needed to download the AIR-powered Balsamiq application for a client this week… exception that proves the rule, I guess?

My point in all this is that the conventional wisdom about platform success has a tendency to be selective in considering what factors will make or break a platform. Licensing, corporate support, community, and of course the underlying technology all play a part. Android is greatly enhanced by the fact that Google puts talented people behind it and then gives it away, but if carriers then use it to promote their own applications and crapware over third-party apps (or cripple them, as they did with JavaME), then Android’s advantage is nil. On the other hand, Apple’s iOS may have remarkable technology, but if their model requires using their corporate strength to force carriers to be dumb pipes, then they may only be able to get iPhone on weaker carriers, which will turn off consumers and retard growth of the platform.

Ultimately, it’s hard to say how this will all play out, but assuming an Android victory based on the presumed success of currently non-existent tablets and set top boxes is surely an act of faith… which probably accounts for all the evangelism.

So why am I on iOS now? Is it because I have some reason to think that it will “win”? Not at all. Mostly it’s because I like the technology. In the mid 2000’s, when user-facing Java was in terminal decline, I tried to learn Flash and Flex to give myself more options, but I just couldn’t bring myself to like it. It just didn’t click for me. But as I got into Cocoa and then the iPhone SDK, I found I liked the design patterns, and the thoughtfulness of all of it. The elegance and power appealed to me. Being a media guy, I also appreciate the platform’s extraordinary support for audio and video: iOS 4 has three major media APIs (AV Foundation, Core Audio, and Media Player), along with other points of interest throughout the stack (video out in UIKit, the low-level abstractions of Core Media, spatialized sound in OpenAL, high-performance DSP functions in the Accelerate framework, etc.). The package is quite limited by comparison, offering some canned functionality for media playback and a few other curious features (face recogniation and dial tone generation, for example), but no way to go deeper. When so many media apps for Android are actually server-dependent, like speech-to-text apps that upload audio files for conversion, it says to me there’s not much of a there there, at least for the things I find interesting.

Even when I switched from journalism and failed screenwriting to programming and book-writing in the late 90’s, at the peak of the Microsoft era, I never considered for a second the option of learning Windows programming and adopting that platform. I just didn’t like their stuff, and still don’t. The point being that I, and you, don’t have to chase the market leader all the time. Go with what you like, where you’ll be the most productive and do the most interesting work.

There’s a bit in William Goldman’s Adventures in the Screen Trade (just looked in my copy, but couldn’t find the exact quote), where the famous screenwriter excuses himself from a story meeting, quitting the project by saying “Look, I am too old, and too rich, to have to put up with this shit.” I like the spirit of that. Personally, I may not be rich, but I’m certainly past the point where I’m willing to put up with someone else’s trite wisdom, or the voice of the developer mob, telling me where I should focus my skills and talents.

From iPhone Media Library to PCM Samples in Dozens of Confounding, Potentially Lossy Steps

iPhone SDK 3.0 provided limited access to the iPod Music Library on the device, allowing third party apps to search for songs (and podcasts and audiobooks, but not video), inspect the metadata, and play items, either independently or in concert with the built-in media player application. But it didn’t provide any form of write-access — you couldn’t add items or playlists, or alter metadata, from a third-party app. And it didn’t allow for third-party apps to do anything with the songs except play them… you couldn’t access the files, convert them to another format, run any kind of analysis on the samples, and so on.

So a lot of us were surprised by the WWDC keynote when iMovie for iPhone 4 was shown importing a song from the iPod library for use in a user-made video. We were even more surprised by the subsequent claim that everything in iMovie for iPhone 4 was possible with public APIs. Frankly, I was ready to call bullshit on it because of the iPod Library issue, but was intrigued by the possibility that maybe you could get at the iPod songs in iOS 4. A tweet from @ibeatmaker confirmed that it was possible, and after some clarification, I found what I needed.

About this time, a thread started on coreaudio-api about whether Core Audio could access iPod songs, so that’s what I set out to prove one way or another. So, my goal was to determine whether or not you could get raw PCM samples from songs in the device’s music library.

The quick answer is: yes. The interesting answer is: it’s a bitch, using three different frameworks, coding idioms that are all over the map, a lot of file-copying and possibly some expensive conversions.

It’s Just One Property; It Can’t Be That Hard

The big secret of how to get to the Music Library isn’t much of a secret. As you might expect, it’s in the MediaLibrary.framework that you use to interact with the library. Each song/podcast/audiobook is a MPMediaItem, and has a number of interesting properties, most of which are user-managed metadata. In iOS 4, there’s a sparkling new addition to the the list of “General Media Item Property Keys”: MPMediaItemPropertyAssetURL. Here’s the docs:

A URL pointing to the media item, from which an AVAsset object (or other URL-based AV Foundation object) can be created, with any options as desired. Value is an NSURL object.

The URL has the custom scheme of ipod-library. For example, a URL might look like this:


OK, so we’re off and running. All we need to do is to pick an MPMediaItem, get this property as an NSURL, and we win.

Or not. There’s an important caveat:

Usage of the URL outside of the AV Foundation framework is not supported.

OK, so that’s probably going to suck. But let’s get started anyways. I wrote a throwaway app to experiment with all this stuff, adding to it piece by piece as stuff started working. I’m posting it here for anyone who wants to reuse my code… all my classes are marked as public domain, so copy-and-paste as you see fit.

Note that this code must be run on an iOS 4 device and cannot be run in the Simulator, which doesn’t support the Media Library APIs.

The app just starts with a “Choose Song” button. When you tap it, it brings up an MPMediaPickerController as a modal view to make you choose a song. When you do so, the -mediaPicker:didPickMediaItems: delegate method gets called. At this point, you could get the first MPMediaItem and get its MPMediaItemPropertyAssetURL media item property. I’d hoped that I could just call this directly from Core Audio, so I wrote a function to test if a URL can be opened by CA:

BOOL coreAudioCanOpenURL (NSURL* url) {
	OSStatus openErr = noErr;
	AudioFileID audioFile = NULL;
	openErr = AudioFileOpenURL((CFURLRef) url,
		 kAudioFileReadPermission ,
	if (audioFile) {
		AudioFileClose (audioFile);
	return openErr ? NO : YES;

Getting a NO back from this function more or less confirms the caveat from the docs: the URL is only for use with the AV Foundation framework.

AV for Vendetta

OK, so plan B: we open it with AV Foundation and see what that gives us.

AV Foundation — setting aside the simple player and recorder classes from 3.0 — is a strange and ferocious beast of a framework. It borrows from QuickTime and QTKit (the capture classes have an almost one-to-one correspondence with their QTKit equivalents), but builds on some new metaphors and concepts that will take the community a while to digest. For editing, it has a concept of a composition, which is made up of tracks, which you can create from assets. This is somewhat analogous to QuickTime’s model that “movies have tracks, which have media”, except that AVFoundation’s compositions are themselves assets. Actually, reading too much QuickTime into AV Foundation is a good way to get in trouble and get disappointed; QuickTime’s most useful functions, like AddMediaSample() and GetMediaNextInterestingTime() are antithetical to AV Foundation’s restrictive design (more on that in a later blog) and therefore don’t exist.

Back to the task at hand. The only thing we can do with the media library URL is to open it in AVFoundation and hope we can do something interesting with it. The way to do this is with an AVURLAsset.

NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];

If this were QuickTime, we’d have an object that we could inspect the samples of. But in AV Foundation, the only sample-level access afforded is a capture-time opportunity to get called back with video frames. There’s apparently no way to get to video frames in a file-based asset (except for a thumbnail-generating method that operates on one-second granularity), and no means of directly accessing audio samples at all.

What we can do is to export this URL to a file in our app’s documents directory, hopefully in a format that Core Audio can open. AV Foundation’s AVAssetExportSession has a class method exportPresetsCompatibleWithAsset: that reveals what kinds of formats we can export to. Since we’re going to burn the time and CPU of doing an export, it would be nice to be able to convert the compressed song into PCM in some kind of useful container like a .caf, or at least an .aif. But here’s what we actually get as options:

compatible presets for songAsset: (

So, no… there’s no “output to CAF”. In fact, we can’t even use AVAssetExportPresetPassthrough to preserve the encoding from the music library: we either have to convert to an AAC (in an .m4a container), or to a QuickTime movie (represented by all the presets ending in “Quality”, as well as the “640×480”).

This Deal is Getting Worse All the Time!

So, we have to export to AAC. That’s not entirely bad, since Core Audio should be able to read AAC in an .m4a container just fine. But it sucks in that it will be a lossy conversion from the source, which could be MP3, Apple Lossless, or some other encoding.

In my GUI, an “export” button appears when you pick a song, and the export is kicked off in the event-handler handleExportTapped. Here’s the UI in mid-export:

MediaLibraryExportThrowaway1 UI in mid-export

To do the export, we create an AVExportSession and provide it with an outputFileType and outputIURL.

AVAssetExportSession *exporter = [[AVAssetExportSession alloc]
		initWithAsset: songAsset
		presetName: AVAssetExportPresetAppleM4A];
NSLog (@"created exporter. supportedFileTypes: %@", exporter.supportedFileTypes);
exporter.outputFileType = @"";
NSString *exportFile = [myDocumentsDirectory()
		stringByAppendingPathComponent: @"exported.m4a"];
[exportURL release];
exportURL = [[NSURL fileURLWithPath:exportFile] retain];
exporter.outputURL = exportURL;	

A few notes here. The docs say that if you set the outputURL without setting outputFileType that the exporter will make a guess based on the file extension. In my experience, the exporter prefers to just throw an exception and die, so set the damn type already. You can get a list of possible values from the class method exporter.supportedFileTypes. The only supported value for the AAC export is Also note the call to a myDeleteFile() function; the export will fail if the target file already exists.

Aside: I did experiment with exporting as a QuickTime movie rather than an .m4a; the code is in the download, commented out. Practical upshot is that it sucks: if your song isn’t AAC, then it gets converted to mono AAC at 44.1 KHz. It’s also worth noting that AV Foundation doesn’t give you any means of setting export parameters (bit depths, sample rates, etc.) other than using the presets. If you’re used to the power of frameworks like Core Audio or the old QuickTime, this is a bitter, bitter pill to swallow.

Block Head

The code gets really interesting when you kick off the export. You would probably expect the export, a long-lasting operation, to be nice and asynchronous. And it is. You might also expect to register a delegate to get asynchronous callbacks as the export progresses. Not so fast, Bucky. As a new framework, AV Foundation adopts Apple’s latest technologies, and that includes blocks. When you export, you provide a completion handler, a block whose no-arg function is called when necessary by the exporter.

Here’s what mine looks like.

// do the export
[exporter exportAsynchronouslyWithCompletionHandler:^{
	int exportStatus = exporter.status;
	switch (exportStatus) {
		case AVAssetExportSessionStatusFailed: {
			// log error to text view
			NSError *exportError = exporter.error;
			NSLog (@"AVAssetExportSessionStatusFailed: %@",
			errorView.text = exportError ?
				[exportError description] : @"Unknown failure";
			errorView.hidden = NO;
		case AVAssetExportSessionStatusCompleted: {
			NSLog (@"AVAssetExportSessionStatusCompleted");
			fileNameLabel.text =
				[exporter.outputURL lastPathComponent];
			// set up AVPlayer
			[self setUpAVPlayerForURL: exporter.outputURL];
			[self enablePCMConversionIfCoreAudioCanOpenURL:
		case AVAssetExportSessionStatusUnknown: {
			NSLog (@"AVAssetExportSessionStatusUnknown"); break;}
		case AVAssetExportSessionStatusExporting: {
			NSLog (@"AVAssetExportSessionStatusExporting"); break;}
		case AVAssetExportSessionStatusCancelled: {
			NSLog (@"AVAssetExportSessionStatusCancelled"); break;}
		case AVAssetExportSessionStatusWaiting: {
			NSLog (@"AVAssetExportSessionStatusWaiting"); break;}
		default: { NSLog (@"didn't get export status"); break;}

This kicks off the export, passing in a block with code to handle all the possible callbacks. The completion handler function doesn’t have to take any arguments (nor do we have to set up a “user info” object for the exporter to pass to the function), since the block allows anything in the local scope to be called from the block. That means the exporter and its state don’t need to be passed in as parameters, because the exporter is a local variable that can be accessed from the block and its state inspected via method calls.

The two messages I handle in my block are AVAssetExportSessionStatusFailed, which dumps the error to a previously-invisible text view, and AVAssetExportSessionStatusCompleted, which sets up an AVPlayer to play the exported audio, which we’ll get to later.

After starting the export, my code runs an NSTimer to fill a UIProgressView. Since the exporter has a progress property that returns a float, it’s pretty straightforward… check the code if you haven’t already done this a bunch of times. Files that were already AAC export almost immediately, while MP3s and Apple Lossless (ALAC) took a minute or more to export. Files in the old .m4p format, from back when the iTunes Store put DRM on all the songs, fail with an error, as seen below.

The Invasion of Time

Kind of as a lark, I added a little GUI to let you play the exported file. AVPlayer was the obvious choice for this, since it should be able to play whatever kind of file you export (.m4a, .mov, whatever).

This brings up the whole issue of how to deal with the representation of time in AV Foundation, which turns out to be great for everyone who ever used the old C QuickTime API (or possibly QuickTime for Java), and all kinds of hell for everyone else.

AV Foundation uses Core Media’s CMTime struct for representing time. In turn, CMTime uses QuickTime’s brilliant but tricky concept of time scales. The idea, in a nutshell, is that your units of measurement for any particular piece of media are variable: pick one that suits the media’s own timing needs. For example, CD audio is 44.1 KHz, so it makes sense to measure time in 1/44100 second intervals. In a CMTime, you’d set the timescale to 44100, and then a given value would represent some number of these units: a single sample would have a value of 1 and would represent 1/44100 of a second, exactly as desired.

I find it’s easier to think of Core Media (and QuickTime) timescales as representing “nths of a second”. One of the clever things you can do is to choose a timescale that suits a lot of different kinds of media. In QuickTime, the default timescale is 600, as this is a common multiple of many important frame-rates: 24 fps for film, 25 fps for PAL (European) TV, 30 fps for NTSC (North America and Japan) TV, etc. Any number of frames in these systems can be evenly and exactly represented with a combination of value and timescale.

Where it gets tricky is when you need to work with values measured in different timescales. This comes up in AV Foundation, as your player may use a different timescale than the items it’s playing. It’s pretty easy to write out the current time label:

CMTime currentTime = player.currentTime;
UInt64 currentTimeSec = currentTime.value / currentTime.timescale;
UInt32 minutes = currentTimeSec / 60;
UInt32 seconds = currentTimeSec % 60;
playbackTimeLabel.text = [NSString stringWithFormat:
		@"%02d:%02d", minutes, seconds];

But it’s hard to update the slider position, since the AVPlayer and the AVPlayerItem it’s playing can (and do) use different time scales. Enjoy the math.

if (player && !userIsScrubbing) {
	CMTime endTime = CMTimeConvertScale (player.currentItem.asset.duration,
	if (endTime.value != 0) {
		double slideTime = (double) currentTime.value /
				(double) endTime.value;
		playbackSlider.value = slideTime;

Basically, the key here is that I need to get the duration of the item being played, but to express that in the time scale of the player, so I can do math on them. That gets done with the CMTimeConvertScale() call. Looks simple here, but if you don’t know that you might need to do a timescale-conversion, your math will be screwy for all sorts of reasons that do not make sense.

Oh, you can drag the slider too, which means doing the same math in reverse.

-(IBAction) handleSliderValueChanged {
	CMTime seekTime = player.currentItem.asset.duration;
	seekTime.value = seekTime.value * playbackSlider.value;
	seekTime = CMTimeConvertScale (seekTime, player.currentTime.timescale,
	[player seekToTime:seekTime];

One other fun thing about all this that I just remembered from looking through my code. The time label and slider updates are called from an NSTimer. I set up the AVPlayer in the completion handler block that’s called by the exporter. This call seems not to be on the main thread, as my update timer didn’t work until I forced its creation over to the main thread with performSelectorOnMainThread:withObject:waitUntilDone:. Good times.

Final Steps

Granted, all this AVPlayer stuff is a distraction. The original goal was to get from iPod Music Library to decompressed PCM samples. We used an AVAssetExportSession to produce an .m4a file in our app’s Documents directory, something that Core Audio should be able to open. The remaining conversion is a straightforward use of CA’s Extended Audio File Services: we open an ExtAudioFileRef on the input .m4a, set a “client format” property representing the PCM format we want it to convert to, read data into a buffer, and write that data back out to a plain AudioFileID. It’s C, so the code is long, but hopefully not too hard on the eyes:

-(IBAction) handleConvertToPCMTapped {
	NSLog (@"handleConvertToPCMTapped");
	// open an ExtAudioFile
	NSLog (@"opening %@", exportURL);
	ExtAudioFileRef inputFile;
	CheckResult (ExtAudioFileOpenURL((CFURLRef)exportURL, &inputFile),
				 "ExtAudioFileOpenURL failed");
	// prepare to convert to a plain ol' PCM format
	AudioStreamBasicDescription myPCMFormat;
	myPCMFormat.mSampleRate = 44100; // todo: or use source rate?
	myPCMFormat.mFormatID = kAudioFormatLinearPCM ;
	myPCMFormat.mFormatFlags =  kAudioFormatFlagsCanonical;	
	myPCMFormat.mChannelsPerFrame = 2;
	myPCMFormat.mFramesPerPacket = 1;
	myPCMFormat.mBitsPerChannel = 16;
	myPCMFormat.mBytesPerPacket = 4;
	myPCMFormat.mBytesPerFrame = 4;
	CheckResult (ExtAudioFileSetProperty(inputFile,
			sizeof (myPCMFormat), &myPCMFormat),
		  "ExtAudioFileSetProperty failed");

	// allocate a big buffer. size can be arbitrary for ExtAudioFile.
	// you have 64 KB to spare, right?
	UInt32 outputBufferSize = 0x10000;
	void* ioBuf = malloc (outputBufferSize);
	UInt32 sizePerPacket = myPCMFormat.mBytesPerPacket;	
	UInt32 packetsPerBuffer = outputBufferSize / sizePerPacket;
	// set up output file
	NSString *outputPath = [myDocumentsDirectory() 
	NSURL *outputURL = [NSURL fileURLWithPath:outputPath];
	NSLog (@"creating output file %@", outputURL);
	AudioFileID outputFile;
		  "AudioFileCreateWithURL failed");
	// start convertin'
	UInt32 outputFilePacketPosition = 0; //in bytes
	while (true) {
		// wrap the destination buffer in an AudioBufferList
		AudioBufferList convertedData;
		convertedData.mNumberBuffers = 1;
		convertedData.mBuffers[0].mNumberChannels = myPCMFormat.mChannelsPerFrame;
		convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
		convertedData.mBuffers[0].mData = ioBuf;

		UInt32 frameCount = packetsPerBuffer;

		// read from the extaudiofile
		CheckResult (ExtAudioFileRead(inputFile,
			 "Couldn't read from input file");
		if (frameCount == 0) {
			printf ("done reading from file");
		// write the converted data to the output file
		CheckResult (AudioFileWritePackets(outputFile,
			   outputFilePacketPosition / myPCMFormat.mBytesPerPacket, 
			 "Couldn't write packets to file");
		NSLog (@"Converted %ld bytes", outputFilePacketPosition);

		// advance the output file write location
		outputFilePacketPosition +=
			(frameCount * myPCMFormat.mBytesPerPacket);
	// clean up

	// GUI update omitted

Note that this uses a CheckResult() convenience function that Kevin Avila wrote for our upcoming Core Audio book… it just looks to see if the return value is noErr and tries to convert it to a readable four-char-code if it seems amenable. It’s in the example file too.

Is It Soup Yet?

Does all this work? Rather than inspecting the AudioStreamBasicDescription of the resulting file, let’s do something more concrete. With Xcode’s “Organizer”, you can access your app’s sandbox on the device. So we can just drag the Application Data to the Desktop.

In the resulting folder, open the Documents folder to find export-pcm.caf. Drag it to QuickTime Player to verify that you do, indeed, have PCM data:

So there you have it. In several hundred lines of code, we’re able to get a song from the iPod Music Library, export it into our app’s Documents directory, and convert it to PCM. With the raw samples, you could now draw an audio waveform view (something you’d think would be essential for video editors who want to match video to beats in the music, but Apple seems dead-set against letting us do do with AV Foundation or QTKit), you could perform analysis or effects on the audio, you could bring it into a Core Audio AUGraph and mix it with other sources… all sorts of possibilities open up.

Clearly, it could be a lot easier. It’s a ton of code, and two file exports (library to .m4a, and .m4a to .caf), when some apps might be perfectly happy to read from the source URL itself and never write to the filesystem… if only they could. Having spent the morning writing this blog, I may well spend the afternoon filing feature requests on I’ll update this blog with OpenRadar numbers for the following requests:

  • Allow Core Audio to open URLs provided by MediaLibrary’s MPMediaItemPropertyAssetURL
  • AV Foundation should allow passthrough export of Media Library items
  • AV Foundation export needs finer-grained control than just presets
  • Provide sample-level access for AVAsset

Still, while I’m bitching and whining, it is remarkable that iOS 4 opens up non-DRM’ed items in the iPod library for export. I never thought that would happen. Furthermore, the breadth and depth of the iOS media APIs remain astonishing. Sometimes terrifying, perhaps, but compared to the facile and trite media APIs that the other guys and girls get, we’re light-years ahead on iOS.

Have fun with this stuff!

Update: This got easier in iOS 4.1. Please forget everything you’ve read here and go read From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary instead.

Threads on the Head

Lack of posts lately… heads down on an iPod game. It’s built up of mini-games, about half of which are done. Today, I’m facing the problem of having to create a mini-game that uses some of the metadata in the iPod library that can’t be directly queried. So, I have to go over every song in the library and perform my own analysis.

Obviously, this would be death to at startup or in the middle of the game. Walking my 700-song library takes 6-7 seconds, and users could have far more songs.

Cut to the win: NSOperation makes it easy to do stuff on threads, without having to, you know, write your own pthread stuff.

As a test, I wrote a subclass of NSOperation to perform a simple analysis on the library: count the number of songs that have “the” in the title. Here’s the -main method:

-(void) main {
   NSDate *beginDate = [NSDate date];
   NSLog (@"*** DYDeepLibraryAwarenessOperation is cogitating and ruminating");
   // test - count titles that have the word "the" in them.
   int theCount = 0;
   MPMediaQuery *allsongs = [MPMediaQuery songsQuery];
   NSLog (@"Thinking about %d songs", [allsongs.items count]);
   for (MPMediaItem *item in allsongs.items) {
      NSRange theRange = [[item valueForProperty:MPMediaItemPropertyTitle]
         rangeOfString: @"the" options: NSCaseInsensitiveSearch];
      if (theRange.location != NSNotFound) {
   NSLog (@"*** %d songs in the iPod Library contain the word "the".", theCount);
   NSLog (@"*** DYDeepLibraryAwarenessOperation has achieved enlightenment (in %f sec).",
         fabs ([beginDate timeIntervalSinceNow]));

Then, as the app starts up, the operation is run as part of an NSOperationQueue

awarenessOperation = [[DYDeepLibraryAwarenessOperation alloc] init];
operationQueue = [[NSOperationQueue alloc] init];
[operationQueue addOperation:awarenessOperation];
NSLog (@"DYDeepLibraryAwareness set up NSOperationQueue");

Here’s the output when the code is just left to run by itself (I’ve taken out the date, classname, and line number from the output for space):

15:30:47.979 DYDeepLibraryAwareness set up NSOperationQueue
15:30:47.976 *** DYDeepLibraryAwarenessOperation is cogitating and ruminating
15:30:48.238 Thinking about 740 songs
15:30:54.586 *** 168 songs in the iPod Library contain the word "the".
15:30:54.589 *** DYDeepLibraryAwarenessOperation has achieved enlightenment (in 6.613482 sec).

Perhaps more importantly, and what I can’t show in a blog, is that this other thread does not interfere with the GUI, or with queries to the iPod library from the main thread, which are done to set up and play the first mini-game. So this means that the iPod library server can handle multiple concurrent requests (yay), and that I can do the heavy lifting to set up later games while presenting and playing the simpler ones.

iPhone 3GS vs. the World

First iPhone 3GS nit: refuses to charge when connected to the USB 2.0 port of the Bella USA Final Cut Keyboard:
Screenshot 2009.06.19 13.23.07

Can’t wait to see if it balks at connecting to the “Built for iPod / Works with iPhone” car radio I bought four months ago.

A word from Wordle

Have my first iPod / Music Library code for iPhone SDK 3.0 working… still adding a few more things before writing the chapter.

While I’m busy, I thought this was amusing: Wordle word clouds of recent activity on this blog:


and my Twitter activity:


iPod Day ’08

Apple’s Let’s Rock event starts in a few hours, and is presumably the annual iPod refresh to kick off the holiday shopping season. With everyone talking about a rumored new form-factor for the nano and a capacity bump for the Touch, my question is whether they’ll keep making the Classic for the buyer who wants to put all 2,000 of their CDs on their iPod, or if Apple’s ready to kill off the HDD-based iPod.

All I want from Santa (or Steve) this year is for the iPhone NDA to drop. The book is effectively stalled at this point, as we no longer know whether the end of the NDA is a “when, not if” proposition. With 200 pages down and completely uncompensated, the idea of writing another 200 is pretty unappealing if we might never be able to sell it.

I’d add that selling our old house in Marietta would be a nice Christmas gift, but having thrown in another $8,000 of work last week and having cut the price by $30,000, it had really better sell long before Christmas. Sigh.

Play Album “More Friends”. No, the other one.

The move has been hard, worse than expected, and I’m just back from a four-day Michigan-to-Atlanta crash trip to have some work done on the house there to try to get it to sell. My mom kindly drove down with me to share the drive, and to put miles on her lease car rather than my Cougar. Her car is a Ford Focus, with the Sync by Microsoft system that they’ve never tried using, and that my dad was interested in seeing work. So, along the way, I tried it out with my iPod Classic and 1G iPhone.

The general idea of Sync is to enable voice-activated access to your media player and your mobile phone. You connect a media device like an iPod either by an analog 1/8″ audio cable, or by a USB cable, both of which connect in the center console. Phones connect via Bluetooth, with the car as essentially a Bluetooth device that you need to pair to your phone. We didn’t try the phone features, but its promised features look useful enough, enabling hands-free voice-activated dialing to names in your address book, and answering of incoming calls by just clicking one button on the steering wheel.

We gave the media features more of a workout. When you first plug in a media device, Sync needs to index it, which takes a few minutes if you have 11,000 titles on your iPod. The 8GB iPhone indexed a lot faster.

After this, you select your audio sources with one-word commands (“LINE IN”, “USB”, “CD”, “FM”, etc.), and in USB mode, you speak straightforward commands like “PLAY ALBUM MOONDANCE” or “PLAY TRACK PORTIONS FOR FOXES”.

This is the bread-and-butter of Sync’s media mode, and like a lot of software, Microsoft’s in particular, it’s solid for the easy cases, but weak once you really get into it:

  • Album and track modes are more or less exclusive. Let’s say you start Magical Mystery Tour, but then you want to jump from the instrumentals and George songs on side one (which are good, of course, but still…) and go straight to the singles on side two by saying “PLAY TRACK HELLO GOODBYE”. As it turns out, issuing this command quietly takes you out of album mode. At the end of “Hello Goodbye”, you won’t go into “Strawberry Fields Forever”, but instead to the next “Hello” song in alphabetical order in your player’s “all songs” list. The only way to get the intended behavior would be to issue a series of “NEXT TRACK” commands, which is burdensome with the speak-response cycle (you could also just click the “next” button over on the dash a bunch of times too).
  • Sync doesn’t like it if you use non-English character sets for your metadata tags. Indexing gave me a stern lecture about supplying metadata for my music, even though I’m sure the small amount of music in the iPhone is all tagged. The problem seems to be the use of Japanese metadata in some of my albums. Sync doesn’t know the difference between エアリスのテ-マ (“Aeris’ Theme”) and アンコール 再臨:片翼の天使 (“Advent: One Winged Angel”), and they can’t be played by track, my bad Japanese pronunciation notwithstanding. More annoyingly, you can’t even play an album with a mix of Western and non-Western metadata tagging. I could play “Don’t Be Afraid” from More Friends: Music from Final Fantasy with the voice command “PLAY TRACK DON’T BE AFRAID”, but the voice command “PLAY ALBUM MORE FRIENDS MUSIC FROM FINAL FANTASY” was never accepted, and simplifying it to “PLAY ALBUM MORE FRIENDS” got me Jools Holland’s More Friends: Small World Big Band Vol. 2, without an option to disambiguate.
  • Sync has problems with Roman numerals, and sometimes with numbers. It took a few tries to get Planet P Project’s 1931 to play, having to instead use its full title with a mispronounced roman numeral: “PLAY ALBUM NINETEEN THIRTY ONE GO OUT DANCING PART EYE”.
  • You forget some of the junk in your metadata tags until Sync makes you pronounce them, like “(Live)” tacked on to the end of every song title from a live album, or having to speak an entire junked-up title like “PLAY ALBUM THE BLACK PARADE MUSIC VIDEO VERSION”.

It’s easy to say that this feels a lot like a typical 1.0 product, specifically a Microsoft 1.0 product. They’ve missed (or chosen not to deal with) a bunch of things that come up in everyday use. But the speech recognition and synthesis in non-ambiguous cases is pretty solid. Moreover, Microsoft is notorious for sticking with things and delivering really solid 3.0s. Unless Apple’s planning on adapting iPhone OS for in-car use real soon — I hope they do, but they choose their battles carefully, and this might be ancillary to them — then Sync is something to watch, as it could be pretty cool a few versions from now. Hopefully, it’ll spread from Ford’s lame nameplates to the better ones, specifically Mazda.

Got a wish: Square Enix developing for iPod

In an O’Reilly blog a while back, I mentioned my hopes for iPhone gaming, given the suitability of the device for certain kinds of games (such as using the touch UI for menu-based RPGs), and Apple’s success in attracting great developers to its existing iPod game program, like Sega and Harmonix.

Add Square Enix to the list, who’ve just released the iPod game Song Summoner, an RPG in which you create NPC allies from the songs on your iPod, and level them up by listening to those songs (shades of the “generate monsters from CDs” gimmick in Tecmo’s Monster Rancher series).

Is it any good? We’ll find out when the download’s done:

Downloading Square Enix's Song Summoner from iTunes

More importantly, maybe we’ll get the inevitable modern-graphics do-over of Final Fantasy VII on the iPhone?