Rss

Archives for : quicktime

Brain Dump: Capturing from an iOS Device in Wirecast

So, with the book nearly done (currently undergoing copy-editing and indexing), I’m using some of my time to get my livestreaming plans together. What I’m likely to so is give the “build” section of the show over to working through examples from the book, so those will be archived as video lessons. Then, along with the interstitials of conference updates, fun videos from the anime fan community, and and a read-through of the Muv-Luv visual novels, I’ll be doing a bunch of Let’s Plays of mostly iOS games.

I did this with the first two test episodes: Tanto Cuore in Test Episode 1 and Love Live! School Idol Project in Test Episode 2. To do this, I need to be able to capture video from an iOS device and ingest it into Wirecast, so I can stream it.

Over the years, I’ve used different techniques for this, and decided to take some time today to figure out which works best on Wirecast for Mac. So, after the jump, behold the results of this project, plus instructions on how to configure each approach.

Continue Reading >>

AV WWDC, part 1: Hot Dog… The AVMovie

I attended WWDC for the first time since 2011, thanks largely to the fact that working for Rev means I need to go out to the office in San Francisco every 6 weeks anyways, so why not make it that week and put my name in the ticket lottery. I probably won’t make a habit of returning to WWDC, and the short supply of tickets makes that a given anyways, but it was nice to be back just this once.

Being there for work, my first priority was making use of unique-to-attendee resources, like the one-on-one UI design reviews and the developers in the labs. The latter can be hit-or-miss based on your problem… we didn’t get any silver bullet for our graphics code, but scored a crucial answer in Core Audio. We’ve found we have to fall back to the software encoder because the hardware encoder (kAppleHardwareAudioCodecManufacturer) would cause ExtAudioFileWrite() to sometimes fail with OSStatus -66570 (kExtAudioFileError_AsyncWriteBufferOverflow). So I asked about that and was told “oh yeah, we don’t support hardware encoding anymore… the new devices don’t need it and the property is just ignored”. I Slacked this to my boss and his reaction was “would be nice if that were in the documentation!” True enough, but at least that’s one wall we can stop banging our head against.

Speaking of media, now that everyone’s had their fill of “Crusty” and the Protocol-Oriented Programming session, I’m going to post a few blogs about media-related sessions.

Continue Reading >>

Apple TV… Buffering…

Forgive me a little Apple armchair-quarterbacking, but I’m still puzzling over the most under-reported story from this week’s Apple Event: the $30 price cut on Apple TV.

Is this the sound of capitulation?

Continue Reading >>

AV Foundation and the void

Yesterday I streamed some WWDC sessions while driving to meet with a client. At a stop, I posted this pissy little tweet:

It got enough quizzical replies (and a couple favorites), I figured I should elaborate as best I can, while staying away from all things NDA.

Part of what I’m reacting to comes from a habit of mine of deliberately seeking the unseen, which I picked up either from Musashi’s Book of Five Rings, or Bastiat’s essay Ce qu’on voit et ce qu’on ne voit pas (“What is Seen and What is Unseen”), because of course with me it’s going to either be samurai or economics, right? Anyways, the idea is to seek truth not in what you encounter, but what is obvious by its absence. It’s something I try to do when editing: don’t focus only on what’s there in the document, also figure out if anything should be there, and isn’t.

And when I look at AV Foundation on iOS and especially on OS X, I feel like there are a lot of things missing.

Continue Reading >>

Wrap up from Voices That Matter iPhone, Spring 2011

Ugh, this is twice in a row that I’ve done a talk for the Voices That Matter: iPhone Developer Conference and been able to neither get all my demos working perfectly in time, nor to cover all the important material in 75 minutes. Yeah, doing a 300-level talk will do that to you, but still…

This weekend’s talk in Seattle was “Advanced Media Manipulation with AV Foundation”, sort of a sequel to the intro talk I did at VTM:i Fall 2010 (Philly), but since the only people who would have been at both conferences are speakers and organizers, I spent about 25 minutes recapping material from the first talk: AVAssets, the AVPlayer and AVPlayerLayer, AVCaptureSession, etc.

Aside: AVPlayerLayer brings up an interesting point, given that it is a subclass of CALayer rather than UIView, which is what’s provided by the view property of the MPMoviePlayerController. What’s the big difference between a CALayer and a UIView, and why does it matter for video players? The difference is that UIView subclasses UIResponder and therefore responds to touch events (the one in the Media Player framework has its own pop-up controls after all), whereas a CALayer, and AVPlayerLayer, does not respond to touch input itself… it’s purely visual.

So anyways, on to the new stuff. What has interested me for a while in AV Foundation is the classes added in 4.1 to do sample level access, AVAssetWriter and AVAssetReader. An earlier blog entry, From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary, exercises both of these, reading from an iPod Library song with an AVAssetReader and writing to a .caf file with an .

Before showing that, I did a new example, VTM_ScreenRecorderTest, which uses AVAssetWriter to make an iOS screen recorder for your application. Basically, it runs an onscreen clock (so that something onscreen is changing), and then uses an NSTimer to periodically do a screenshot and then write that image as a video sample to the single video track of a QuickTime .mov file. The screenshot code is copied directly from Apple’s Technical Q&A 1703, and the conversion from the resulting UIImage to the CMSampleBufferRef needed for writing raw samples is greatly simplified with the AVAssetWriterInputPixelBufferAdaptor.

In the Fall in Philly, I showed a cuts-only movie editor that just inserted segments up at the AVMutableComposition level. For this talk, I wanted to do multiple video tracks, with transitions between them and titles. I sketched out a very elaborate demo project, VTM_AVEffects, which was meant to perform the simple effects I used for the Running Start (.m4v download) movie that I often use an example. In other words, I needed to overlay titles and do some dissolves.

About 10 hours into coding my example, I realized I was not going to finish this demo, and settled for getting the title and the first dissolve. So if you’re going to download the code, please keep in mind that this is badly incomplete code (the massive runs of commented-out misadventures should make that clear), and it is neither production-quality, nor copy-and-paste quality. And it most certainly has memory leaks and other unresolved bugs. Oh, and all the switches and text fields? They do nothing. The only things that work are tapping “perform” and then “play” (or the subsequent “pause”). Scrubbing the slider and setting the rate field mostly work, but have bugs, particularly in the range late in the movie where there are no valid video segments, but the :30 background music is still valid.

Still, I showed it and will link to it at the end of this blog because there is some interesting working code worth discussing. Let’s start with the dissolve between the first two shots. You’ll notice in the code that I go with Apple’s recommendation of working back and forth between two tracks (“A” and “B”, because I learned on analog equipment and always think of it as A/B Roll editing). The hard part — and by hard, I mean frustrating, soul-draining, why-the-frack-isn’t-this-goddamn-thing-working hard — is providing the instructions that describe how the tracks are to be composited together. In AV Foundation, you provide an AVVideoComposition that describes the compositing of every region of interest in your movie (oh, I’m sorry, in your AVComposition… which is in no way related to the AVVideoComposition). The AVVideoComposition has an array of AVVideoCompositionInstructions, each covering a specific timeRange, and each containing its own AVVideoCompositionLayerInstruction to describe the opacity and affine transform (static or animated) of each video track. Describing it like that, I probably should have included a diagram… maybe I’ll whip one up in OmniGraffle and post it later. Anyways, this is fairly difficult to get right, as your various instructions need to account for all time ranges across all tracks, with no gaps or overlaps, and timing up identically with the duration of the AVComposition. Like I said, I got exactly one fade-in working before I had to go pencils-down on the demo code and start preparing slides. Maybe I’ll be able to fix it later… but don’t hold me to that, OK?

The other effect I knew I had to show off was titles. AVFoundation has a curious way to do this. Rather than add your titles and other overlays as new video tracks, as you’d do in QuickTime, AVF ties into Core Animation and has you do your image magic there. By using an AVSynchronizedLayer, you can create sublayers whose animations get their timing from the movie, rather than from the system clock. It’s an interesting idea, given how powerful Quartz and Core Animation are. But it’s also deeply weird to be creating content for your movie that is not actually part of the movie, but is rather just loosely coupled to the player object by way of the AVPlayerItem (and this leads to some ugliness when you want to export the movie and include the animations in the export). I also noticed that when I scrubbed past the fade-out of the title and then set the movie playback rate to a negative number to run it backward, the title did not fade back in as expected… which makes me wonder if there are assumptions in UIKit or Core Animation that time always runs forward, which is of course not true when AV Foundation controls animation time, via the AVSynchronizedLayer

My code is badly incomplete and buggy, and anyone interested in a solid demo of AV Foundation editing would do well to check out the AVEditDemo from Apple’s WWDC 2010 sample code. Still, I said I would post what I’ve got, so there you go. No complaints from you people, or the next sample code you get from me will be another goddamned webapp server written in Java.

Oh yeah, at one point, I dreamed of having enough time to write a demo that would process A/V capture data in real-time, using AVCaptureSessionDataOutput, maybe showing a live audio waveform or doing an FFT. But that demo didn’t even get written. Maybe next conference.

For a speaker on “advanced” AV Foundation, I find I still have a lot of unanswered questions about this framework. I’m not sure how well it supports saving an AVComposition that you’re editing — even if the AVF classes implement NSCopying / NSMutableCopying and could therefore be persisted with key-value archiving, that doesn’t address how you’d persist your associated Core Audio animations. I also would have to think hard about how to make edits undoable and redoable… I miss QuickTime’s MovieEditState already. And to roll an edit… dig into a track’s segments and dick with their timeRanges, or do you have to remove and reinsert the segment?

And what else can I do with CASynchronizedLayer? I don’t see particularly compelling transitions in AVF — just dissolves and trivial push wipes (ie, animation of the affine transform) — but if I could render whatever I like in a CALayer and pick up the timing from the synchronized layer, is that how I roll my own Quartz-powered goodness? Speaker Cathy Shive and I were wondering about this idea over lunch, trying to figure out if we would subclass CAAnimation or CALayer in hopes of getting a callback along the lines of “draw your layer for time t“, which would be awesome if only either of us were enough of a Core Animation expert to pull it off.

So, I feel like there’s a lot more for me to learn on this, which is scary because some people think I’m an expert on the topic… for my money, the experts are the people in the AV Foundation dev forums (audio, video), since they’re the ones really using it in production and providing feedback to Apple. Fortunately, these forums get a lot of attention from Apple’s engineers, particularly bford, so that sets a lot of people straight about their misconceptions. I think it’s going to be a long learning curve for all of us.

If you’re keen to start, here are the slides and demo code:

Secret APIs

Discussing Apple’s Java deprecation, Java creator James Gosling blogged about the background of Java on the Mac, saying “the biggest obstacle was their use of secret APIs. Yes, OS X has piles of secret APIs. Just like the ones that Microsoft had that contributed to their antitrust problems.”

In a recent Q&A at Google, available on YouTube, he elaborates further, around 43 minutes in (embedded YouTube clip will take you right there, otherwise read the blockquote):

[youtube=http://www.youtube.com/watch?v=9ei-rbULWoA&start=2575]

At Sun, we had worked with them to try to take it over. But there were all kinds of issues, and it was mostly things like, you know, to integrate properly into the Mac OS, there were a bunch of secret APIs. And in their integration, there were all these secret APIs, and they wouldn’t tell us what they were, we just knew they were there. And then, you know, it’s sort of like half their brain wanted to give us the code, half their brain is like “no no no no no, we can’t”. So, nyah, that was all kind of spastic.

The fact that Dr. Gosling brings up “secret APIs” repeatedly when talking about the subject makes me think that he really wants to make this point that Apple’s use of secret APIs and its intransigence has been a major problem for Java on the Mac.

But… is it true? How big a deal are secret APIs in OSX and iOS anyways?

Nobody denies that there are undocumented and otherwise secret APIs throughout both OSX and iOS. They are easily found through techniques such as reverse-engineering and method swizzling. On OSX, they can be called, provided you can figure out their proper usage without documentation. Technically, this is also possible on iOS, although use of non-public APIs will get your app rejected by the App Store, so it’s largely pointless.

The benign explanation for secret APIs is that they’re used internally but haven’t been fully vetted for use by third-parties. We’ve all written code we’re not proud of and wouldn’t want others calling, or at least written utility functions and methods that were only thought through for certain uses and aren’t known to be appropriate for general use. An interesting example is iOS’ UIGetScreenImage function. As a devforums thread indicates, Apple started allowing use of this private API in 2009 because there wasn’t a good public alternative, with the proviso that its use would be disallowed once a suitable public API was released. This occurred with the arrival of AV Foundation in iOS 4.0, and direct calls to UIGetScreenImage are again grounds for App Store rejection.

Aside from technical grounds, another reason for secret APIs is legal entanglements. There was an example of this in one of my earliest blogs: Apple licensed AAC encoding for OS X and for its own apps on Windows (iTunes, QuickTime Player), but not for third-party apps on Windows. According to Technical Q&A QA1347, a developer who wanted to provide this functionality on Windows would need to license the AMR encoding separately from VoiceAge, then provide proof of that license to Apple in order to get an SDK that would allow their code to make the secret call into QuickTime’s encoder.

But what can we say about Dr. Gosling’s complaints about secret APIs and Java? Certainly it plays well to the passions and politics of the Java community, but I’m not yet convinced. We know that most of Java actually ports to the Mac pretty easily: Landon Fuller’s “Soy Latte” project ported JDK 6 to the Mac in just a few person-weekends, and was later incorporated into OpenJDK’s BSD Ports subproject. But that left out some hard parts with intense native entanglements: sound, and the UI (Soy Latte, like most vanilla Java ports, relies on X11). Gosling acknowledges this in his blog, saying of these secret APIs that “the big area (that I’m aware of) where these are used is in graphics rendering.”

However, does this seriously mean that porting the Java graphics layer — Java2D, AWT, and Swing — is impractical or impossible without access to these secret APIs? It can’t be. After all, SWT exists for Mac as well, as a third-party creation, and it does the same things as these missing pieces of OpenJDK. In fact, SWT is more tightly-coupled to native code, as its whole approach is to bind Java objects to native peers (originally in Carbon, later in Cocoa), while Swing is all about avoiding native entanglements and instead painting look-alike widgets. Furthermore, I think Java’s rendering pipeline was switched over to an OpenGL implementation a while back, and that’s a public API that exists on OSX. So this kind of begs the question: what does Java need that isn’t provided by a public API? It doesn’t seem like graphics can be the problem.

The conspiracy theorists could argue that Apple has its own APIs that are more performant than the public APIs. Maybe, but what would be the point? Microsoft was roundly criticized for this in the 90’s, but Microsoft had more cases where their own products competed directly with third parties, and therefore could have incentive for their OS team to give a secret hand to the applications team. With Apple, software is their second-smallest revenue segment, and there are fewer cases where the company competes directly with a third-party rival (though there are clearly cases of this, such as Final Cut versus Premiere). Often, Apple’s software serves a strategic role – iLife may be more useful for selling Macs than for selling itself on DVD to existing Mac owners. So sure, Apple could be using secret APIs to give itself a leg up on competitors, but it’s hard to see how that would really be in their self-interest.

Having said all this, I’m still thwarted by a private API I needed this Summer: the “suck into a point” animation isn’t exposed by a Cocoa API on OSX, and asking for help on cocoa-unbound didn’t turn up an answer. Apparently, it’s possible on iOS, but via an undocumented method. Why this isn’t public on OSX or iOS, I can’t imagine, particularly given that Apple’s apps have made it a fairly standard behavior, meaning users will expect it when you use the round close button on a free-floating view. Oversight? Not ready for public consumption? Apple just being dicks? Who knows!

Of course, that brings up the last point about secret APIs. At the end of the day, they’re almost always conveniences. If something is possible at all, you could probably just do it yourself. I don’t know exactly what transforms are involved in the suck-to-close animation, but it’s surely possible to create a reasonably close approximation with Core Animation. Similarly, instead of calling QuickTime’s secret AAC encoder on Windows, you could license some other library or framework, or write your own. It might not be easy or practical, but if Apple can move the bits in some specific way, it must at least be possible for a third-party to do the same.

A Big Bet on HTTP Live Streaming

So, Apple announced yesterday that they’ll stream today’s special event live, and everyone immediately assumed the load would crash the stream, if not the whole internet, myself included. But then I got thinking: they wouldn’t even try it if they weren’t pretty damn sure it would work. So what makes them think this will work?

HTTP Live Streaming, that’s why. I banged out a series of tweets (1, 2, 3, 4, 5, 6, 7, 8, 9) spelling out why the nature of HTTP Live Streaming (which I worked with briefly on a fix-up job last year) makes it highly plausible for such a use.

To summarize the spec: a client retrieves a playlist (an .m3u8, which is basically a UTF-8’ed version of the old WinAmp playlist format) that lists segments of the stream as flat files (often .m4a’s for audio, and .ts for video, which is an MPEG-2 transport stream, though Apple’s payload is presumably H.264/AAC). The client downloads these flat files and sends them to its local media player, and refreshes the playlist periodically to see if there are new files to fetch. The sizing and timing is configurable, but I think the defaults are like a 60-second refresh cycle on the playlist, and segments of about 10 seconds each.

This can scale for a live broadcast by using edge servers, which Apple has long depended on Akamai (and others?) for. Apple vends you a playlist URL at a local edge server, and its contents are all on the edge server, so the millions of viewers don’t pound Apple with requests — the load is pushed out to the edge of the internet, and largely stays off the backbone. Also, all the local clients will be asking for the same handful of segment files at the same time, so these could be in in-memory caches on the edge servers (since they’re only 10 seconds of video each). All these are good things.

I do wonder if local 3G cells will be a point of failure, if the bandwidth on a cell gets saturated by iPhone clients receiving the files. But for wired internet and wifi LANs, I suspect this is highly viable.

One interesting point brought up by TUAW is the dearth of clients that can handle HTTP Live Streaming. So far, it’s iOS devices, and Macs with QuickTime X (i.e., running Snow Leopard). The windows version of QuickTime doesn’t support HTTP Live Streaming (being based on the “old” 32-bit QuickTime on Mac, it may effectively be in maintenance mode). Open standard or not, there are no handy HTTP Live Streaming clients for other OS’s, though MacRumors’ VNC-based workaround (which requires you to manually download the .m3u8 playlist and do the refresh yourself), suggests it would be pretty easy to get it running elsewhere, since you already have the ability to play a playlist of segments and just need to automate the playlist refresh.

Dan Leehr tweeted back that Apple has talked a good game on HTTP Live Streaming, but hasn’t really showed much. Maybe this event is meant to change that. Moreover, you can’t complain about the adoption — last December, the App Store terms added a new fiat that any streaming video app must use HTTP Live Streaming (although a February post seems to ratchet this back to apps that stream for more than 10 minutes over the cellular network), so any app you see with a video streaming feature almost certainly uses HLS. At WWDC, Apple boasted about the MLB app using HLS, and it’s a safe bet that most/all other iOS video streaming apps (Netflix, Crunchyroll, etc.) use it too.

And one more thing to think about… MLB and Netflix aren’t going to stream without DRM, right? That’s the other piece that nobody ever talks about with HTTP Live Streaming: the protocol allows for encrypting of the media files. See section 5 of the spec. As much as Apple and its fanboys talk up HTML5 as a rival to and replacement for Flash, this is the thing that should really worry Adobe: commoditizing DRM’ed video streaming.

Connecting the Dots

Philip Hodgetts e-mailed me yesterday, having found my recent CocoaHeads Ann Arbor talk on AV Foundation, and searching from there to find my blog. The first thing this brings up is that I’ve been slack about linking my various online identities and outlets… it should be easier for anyone who happens across my stuff to be able to get to it more easily. As a first step, behold the “More of This Stuff” box at the right, which links to my slideshare.net presentations and my Twitter feed. The former is updated less frequently than the latter, but also contains fewer obscenities and references to anime.

Philip co-hosts a podcast about digital media production, and their latest episode is chock-ful of important stuff about QuickTime and QTKit that more people should know (frame rate doesn’t have to be constant!), along with wondering aloud about where the hell Final Cut stands given the QuickTime/QTKit schism on the Mac and the degree to which it is built atop the 32-bit legacy QuickTime API. FWIW, between reported layoffs on the Final Cut team and their key programmers working on iMovie for iPhone, I do not have a particularly good feeling about the future of FCP/FCE.

Philip, being a Mac guy and not an iOS guy, blogged that he was surprised my presentation wasn’t an NDA violation. Actually, AV Foundation has been around since 2.2, but only became a document-based audio/video editing framework in iOS 4. The only thing that’s NDA is what’s in iOS 4.1 (good stuff, BTW… hope we see it Wednesday, even though I might have to race out some code and a blog entry to revise this beastly entry).

He’s right in the podcast, though, that iPhone OS / iOS has sometimes kept some of its video functionality away from third-party developers. For example, Safari could embed a video, but through iPhone OS 3.1, the only video playback option was the MPMoviePlayerController, which takes over the entire screen when you play the movie. 3.2 provided the ability to get a separate view… but recall that 3.2 was iPad-only, and the iPad form factor clearly demands the ability to embed video in a view. In iOS 4, it may make more sense to ditch MPMoviePlayerController and leave MediaPlayer.framework for iPod library access, and instead do playback by getting an AVURLAsset and feeding it to an AVPlayer.

One slide Philip calls attention to in his blog is where I compare the class and method counts of AV Foundation, android.media, QTKit, and QuickTime for Java. A few notes on how I spoke to this slide when I gave my presentation:

  • First, notice that AV Foundation is already larger than QTKit. But also notice that while it has twice as many classes, it only has about 30% more methods. This is because AV Foundation had the option of starting fresh, rather than wrapping the old QuickTime API, and thus could opt for a more hierarchical class structure. AVAssets represent anything playable, while AVCompositions are movies that are being created and edited in-process. Many of the subclasses also split out separate classes for their mutable versions. By comparison, QTKit’s QTMovie class has over 100 methods; it just has to be all things to all people.

  • Not only is android.media smaller than AV Foundation, it also represents the alpha and omega of media on that platform, so while it’s mostly provided as a media player and capture API, it also includes everything else media-related on the platform, like ringtone synthesis and face recognition. While iOS doesn’t do these, keep in mind that on iOS, there are totally different frameworks for media library access (MediaPlayer.framework), low-level audio (Core Audio), photo library access (AssetsLibrary.framework), in-memory audio clips (System Sounds), etc. By this analysis, media support on iOS is many times more comprehensive than what’s currently available in Android.

  • Don’t read too much into my inclusion of QuickTime for Java. It was deprecated at WWDC 2008, after all. I put it in this chart because its use of classes and methods offered an apples-to-apples comparison with the other frameworks. Really, it’s there as a proxy for the old C-based QuickTime API. If you counted the number of functions in QuickTime, I’m sure you’d easily top 10,000. After all, QTJ represented Apple’s last attempt to wrap all of QuickTime with an OO layer. In QTKit, there’s no such ambition to be comprehensive. Instead, QTKit feels like a calculated attempt to include the stuff that the most developers will need. This allows Apple to quietly abandon unneeded legacies like Wired Sprites and QuickTime VR. But quite a few babies are being thrown out with the bathwater — neither QTKit nor AV Foundation currently has equivalents for the “get next interesting time” functions (which could find edit points or individual samples), or the ability to read/write individual samples with GetMediaSample() / AddMediaSample().

One other point of interest is one of the last slides, which quotes a macro seen throughout AVFoundation and Core Media in iOS 4:


__OSX_AVAILABLE_STARTING(__MAC_10_7,__IPHONE_4_0);

Does this mean that AV Foundation will appear on Mac OS X 10.7 (or hell, does it mean that 10.7 work is underway)? IMHO, not enough to speculate, other than to say that someone was careful to leave the door open.

Update: Speaking of speaking on AV Foundation, I should mention again that I’m going to be doing a much more intense and detailed Introduction to AV Foundation at the Voices That Matter: iPhone Developer Conference in Philadelphia, October 16-17. $100 off with discount code PHRSPKR.

From iPhone Media Library to PCM Samples in Dozens of Confounding, Potentially Lossy Steps

iPhone SDK 3.0 provided limited access to the iPod Music Library on the device, allowing third party apps to search for songs (and podcasts and audiobooks, but not video), inspect the metadata, and play items, either independently or in concert with the built-in media player application. But it didn’t provide any form of write-access — you couldn’t add items or playlists, or alter metadata, from a third-party app. And it didn’t allow for third-party apps to do anything with the songs except play them… you couldn’t access the files, convert them to another format, run any kind of analysis on the samples, and so on.

So a lot of us were surprised by the WWDC keynote when iMovie for iPhone 4 was shown importing a song from the iPod library for use in a user-made video. We were even more surprised by the subsequent claim that everything in iMovie for iPhone 4 was possible with public APIs. Frankly, I was ready to call bullshit on it because of the iPod Library issue, but was intrigued by the possibility that maybe you could get at the iPod songs in iOS 4. A tweet from @ibeatmaker confirmed that it was possible, and after some clarification, I found what I needed.

About this time, a thread started on coreaudio-api about whether Core Audio could access iPod songs, so that’s what I set out to prove one way or another. So, my goal was to determine whether or not you could get raw PCM samples from songs in the device’s music library.

The quick answer is: yes. The interesting answer is: it’s a bitch, using three different frameworks, coding idioms that are all over the map, a lot of file-copying and possibly some expensive conversions.

It’s Just One Property; It Can’t Be That Hard

The big secret of how to get to the Music Library isn’t much of a secret. As you might expect, it’s in the MediaLibrary.framework that you use to interact with the library. Each song/podcast/audiobook is a MPMediaItem, and has a number of interesting properties, most of which are user-managed metadata. In iOS 4, there’s a sparkling new addition to the the list of “General Media Item Property Keys”: MPMediaItemPropertyAssetURL. Here’s the docs:

A URL pointing to the media item, from which an AVAsset object (or other URL-based AV Foundation object) can be created, with any options as desired. Value is an NSURL object.

The URL has the custom scheme of ipod-library. For example, a URL might look like this:

ipod-library://item/item.m4a?id=12345

OK, so we’re off and running. All we need to do is to pick an MPMediaItem, get this property as an NSURL, and we win.

Or not. There’s an important caveat:

Usage of the URL outside of the AV Foundation framework is not supported.

OK, so that’s probably going to suck. But let’s get started anyways. I wrote a throwaway app to experiment with all this stuff, adding to it piece by piece as stuff started working. I’m posting it here for anyone who wants to reuse my code… all my classes are marked as public domain, so copy-and-paste as you see fit.

MediaLibraryExportThrowaway1.zip

Note that this code must be run on an iOS 4 device and cannot be run in the Simulator, which doesn’t support the Media Library APIs.

The app just starts with a “Choose Song” button. When you tap it, it brings up an MPMediaPickerController as a modal view to make you choose a song. When you do so, the -mediaPicker:didPickMediaItems: delegate method gets called. At this point, you could get the first MPMediaItem and get its MPMediaItemPropertyAssetURL media item property. I’d hoped that I could just call this directly from Core Audio, so I wrote a function to test if a URL can be opened by CA:



BOOL coreAudioCanOpenURL (NSURL* url) {
	OSStatus openErr = noErr;
	AudioFileID audioFile = NULL;
	openErr = AudioFileOpenURL((CFURLRef) url,
		 kAudioFileReadPermission ,
		 0,
		 &audioFile);
	if (audioFile) {
		AudioFileClose (audioFile);
	}
	return openErr ? NO : YES;
}

Getting a NO back from this function more or less confirms the caveat from the docs: the URL is only for use with the AV Foundation framework.

AV for Vendetta

OK, so plan B: we open it with AV Foundation and see what that gives us.

AV Foundation — setting aside the simple player and recorder classes from 3.0 — is a strange and ferocious beast of a framework. It borrows from QuickTime and QTKit (the capture classes have an almost one-to-one correspondence with their QTKit equivalents), but builds on some new metaphors and concepts that will take the community a while to digest. For editing, it has a concept of a composition, which is made up of tracks, which you can create from assets. This is somewhat analogous to QuickTime’s model that “movies have tracks, which have media”, except that AVFoundation’s compositions are themselves assets. Actually, reading too much QuickTime into AV Foundation is a good way to get in trouble and get disappointed; QuickTime’s most useful functions, like AddMediaSample() and GetMediaNextInterestingTime() are antithetical to AV Foundation’s restrictive design (more on that in a later blog) and therefore don’t exist.

Back to the task at hand. The only thing we can do with the media library URL is to open it in AVFoundation and hope we can do something interesting with it. The way to do this is with an AVURLAsset.


NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
AVURLAsset *songAsset = [AVURLAsset URLAssetWithURL:assetURL options:nil];

If this were QuickTime, we’d have an object that we could inspect the samples of. But in AV Foundation, the only sample-level access afforded is a capture-time opportunity to get called back with video frames. There’s apparently no way to get to video frames in a file-based asset (except for a thumbnail-generating method that operates on one-second granularity), and no means of directly accessing audio samples at all.

What we can do is to export this URL to a file in our app’s documents directory, hopefully in a format that Core Audio can open. AV Foundation’s AVAssetExportSession has a class method exportPresetsCompatibleWithAsset: that reveals what kinds of formats we can export to. Since we’re going to burn the time and CPU of doing an export, it would be nice to be able to convert the compressed song into PCM in some kind of useful container like a .caf, or at least an .aif. But here’s what we actually get as options:

compatible presets for songAsset: (
 AVAssetExportPresetLowQuality,
 AVAssetExportPresetHighestQuality,
 AVAssetExportPreset640x480,
 AVAssetExportPresetMediumQuality,
 AVAssetExportPresetAppleM4A
 )

So, no… there’s no “output to CAF”. In fact, we can’t even use AVAssetExportPresetPassthrough to preserve the encoding from the music library: we either have to convert to an AAC (in an .m4a container), or to a QuickTime movie (represented by all the presets ending in “Quality”, as well as the “640×480”).

This Deal is Getting Worse All the Time!

So, we have to export to AAC. That’s not entirely bad, since Core Audio should be able to read AAC in an .m4a container just fine. But it sucks in that it will be a lossy conversion from the source, which could be MP3, Apple Lossless, or some other encoding.

In my GUI, an “export” button appears when you pick a song, and the export is kicked off in the event-handler handleExportTapped. Here’s the UI in mid-export:

MediaLibraryExportThrowaway1 UI in mid-export

To do the export, we create an AVExportSession and provide it with an outputFileType and outputIURL.


AVAssetExportSession *exporter = [[AVAssetExportSession alloc]
		initWithAsset: songAsset
		presetName: AVAssetExportPresetAppleM4A];
NSLog (@"created exporter. supportedFileTypes: %@", exporter.supportedFileTypes);
exporter.outputFileType = @"com.apple.m4a-audio";
NSString *exportFile = [myDocumentsDirectory()
		stringByAppendingPathComponent: @"exported.m4a"];
myDeleteFile(exportFile);
[exportURL release];
exportURL = [[NSURL fileURLWithPath:exportFile] retain];
exporter.outputURL = exportURL;	

A few notes here. The docs say that if you set the outputURL without setting outputFileType that the exporter will make a guess based on the file extension. In my experience, the exporter prefers to just throw an exception and die, so set the damn type already. You can get a list of possible values from the class method exporter.supportedFileTypes. The only supported value for the AAC export is com.apple.m4a-audio. Also note the call to a myDeleteFile() function; the export will fail if the target file already exists.

Aside: I did experiment with exporting as a QuickTime movie rather than an .m4a; the code is in the download, commented out. Practical upshot is that it sucks: if your song isn’t AAC, then it gets converted to mono AAC at 44.1 KHz. It’s also worth noting that AV Foundation doesn’t give you any means of setting export parameters (bit depths, sample rates, etc.) other than using the presets. If you’re used to the power of frameworks like Core Audio or the old QuickTime, this is a bitter, bitter pill to swallow.

Block Head

The code gets really interesting when you kick off the export. You would probably expect the export, a long-lasting operation, to be nice and asynchronous. And it is. You might also expect to register a delegate to get asynchronous callbacks as the export progresses. Not so fast, Bucky. As a new framework, AV Foundation adopts Apple’s latest technologies, and that includes blocks. When you export, you provide a completion handler, a block whose no-arg function is called when necessary by the exporter.

Here’s what mine looks like.


// do the export
[exporter exportAsynchronouslyWithCompletionHandler:^{
	int exportStatus = exporter.status;
	switch (exportStatus) {
		case AVAssetExportSessionStatusFailed: {
			// log error to text view
			NSError *exportError = exporter.error;
			NSLog (@"AVAssetExportSessionStatusFailed: %@",
				exportError);
			errorView.text = exportError ?
				[exportError description] : @"Unknown failure";
			errorView.hidden = NO;
			break;
		}
		case AVAssetExportSessionStatusCompleted: {
			NSLog (@"AVAssetExportSessionStatusCompleted");
			fileNameLabel.text =
				[exporter.outputURL lastPathComponent];
			// set up AVPlayer
			[self setUpAVPlayerForURL: exporter.outputURL];
			[self enablePCMConversionIfCoreAudioCanOpenURL:
				exporter.outputURL];
			break;
		}
		case AVAssetExportSessionStatusUnknown: {
			NSLog (@"AVAssetExportSessionStatusUnknown"); break;}
		case AVAssetExportSessionStatusExporting: {
			NSLog (@"AVAssetExportSessionStatusExporting"); break;}
		case AVAssetExportSessionStatusCancelled: {
			NSLog (@"AVAssetExportSessionStatusCancelled"); break;}
		case AVAssetExportSessionStatusWaiting: {
			NSLog (@"AVAssetExportSessionStatusWaiting"); break;}
		default: { NSLog (@"didn't get export status"); break;}
	}
}];

This kicks off the export, passing in a block with code to handle all the possible callbacks. The completion handler function doesn’t have to take any arguments (nor do we have to set up a “user info” object for the exporter to pass to the function), since the block allows anything in the local scope to be called from the block. That means the exporter and its state don’t need to be passed in as parameters, because the exporter is a local variable that can be accessed from the block and its state inspected via method calls.

The two messages I handle in my block are AVAssetExportSessionStatusFailed, which dumps the error to a previously-invisible text view, and AVAssetExportSessionStatusCompleted, which sets up an AVPlayer to play the exported audio, which we’ll get to later.

After starting the export, my code runs an NSTimer to fill a UIProgressView. Since the exporter has a progress property that returns a float, it’s pretty straightforward… check the code if you haven’t already done this a bunch of times. Files that were already AAC export almost immediately, while MP3s and Apple Lossless (ALAC) took a minute or more to export. Files in the old .m4p format, from back when the iTunes Store put DRM on all the songs, fail with an error, as seen below.

The Invasion of Time

Kind of as a lark, I added a little GUI to let you play the exported file. AVPlayer was the obvious choice for this, since it should be able to play whatever kind of file you export (.m4a, .mov, whatever).

This brings up the whole issue of how to deal with the representation of time in AV Foundation, which turns out to be great for everyone who ever used the old C QuickTime API (or possibly QuickTime for Java), and all kinds of hell for everyone else.

AV Foundation uses Core Media’s CMTime struct for representing time. In turn, CMTime uses QuickTime’s brilliant but tricky concept of time scales. The idea, in a nutshell, is that your units of measurement for any particular piece of media are variable: pick one that suits the media’s own timing needs. For example, CD audio is 44.1 KHz, so it makes sense to measure time in 1/44100 second intervals. In a CMTime, you’d set the timescale to 44100, and then a given value would represent some number of these units: a single sample would have a value of 1 and would represent 1/44100 of a second, exactly as desired.

I find it’s easier to think of Core Media (and QuickTime) timescales as representing “nths of a second”. One of the clever things you can do is to choose a timescale that suits a lot of different kinds of media. In QuickTime, the default timescale is 600, as this is a common multiple of many important frame-rates: 24 fps for film, 25 fps for PAL (European) TV, 30 fps for NTSC (North America and Japan) TV, etc. Any number of frames in these systems can be evenly and exactly represented with a combination of value and timescale.

Where it gets tricky is when you need to work with values measured in different timescales. This comes up in AV Foundation, as your player may use a different timescale than the items it’s playing. It’s pretty easy to write out the current time label:


CMTime currentTime = player.currentTime;
UInt64 currentTimeSec = currentTime.value / currentTime.timescale;
UInt32 minutes = currentTimeSec / 60;
UInt32 seconds = currentTimeSec % 60;
playbackTimeLabel.text = [NSString stringWithFormat:
		@"%02d:%02d", minutes, seconds];

But it’s hard to update the slider position, since the AVPlayer and the AVPlayerItem it’s playing can (and do) use different time scales. Enjoy the math.


if (player && !userIsScrubbing) {
	CMTime endTime = CMTimeConvertScale (player.currentItem.asset.duration,
		currentTime.timescale,
		kCMTimeRoundingMethod_RoundHalfAwayFromZero);
	if (endTime.value != 0) {
		double slideTime = (double) currentTime.value /
				(double) endTime.value;
		playbackSlider.value = slideTime;
	}
}

Basically, the key here is that I need to get the duration of the item being played, but to express that in the time scale of the player, so I can do math on them. That gets done with the CMTimeConvertScale() call. Looks simple here, but if you don’t know that you might need to do a timescale-conversion, your math will be screwy for all sorts of reasons that do not make sense.

Oh, you can drag the slider too, which means doing the same math in reverse.


-(IBAction) handleSliderValueChanged {
	CMTime seekTime = player.currentItem.asset.duration;
	seekTime.value = seekTime.value * playbackSlider.value;
	seekTime = CMTimeConvertScale (seekTime, player.currentTime.timescale,
			kCMTimeRoundingMethod_RoundHalfAwayFromZero);
	[player seekToTime:seekTime];
}

One other fun thing about all this that I just remembered from looking through my code. The time label and slider updates are called from an NSTimer. I set up the AVPlayer in the completion handler block that’s called by the exporter. This call seems not to be on the main thread, as my update timer didn’t work until I forced its creation over to the main thread with performSelectorOnMainThread:withObject:waitUntilDone:. Good times.

Final Steps

Granted, all this AVPlayer stuff is a distraction. The original goal was to get from iPod Music Library to decompressed PCM samples. We used an AVAssetExportSession to produce an .m4a file in our app’s Documents directory, something that Core Audio should be able to open. The remaining conversion is a straightforward use of CA’s Extended Audio File Services: we open an ExtAudioFileRef on the input .m4a, set a “client format” property representing the PCM format we want it to convert to, read data into a buffer, and write that data back out to a plain AudioFileID. It’s C, so the code is long, but hopefully not too hard on the eyes:


-(IBAction) handleConvertToPCMTapped {
	NSLog (@"handleConvertToPCMTapped");
	
	// open an ExtAudioFile
	NSLog (@"opening %@", exportURL);
	ExtAudioFileRef inputFile;
	CheckResult (ExtAudioFileOpenURL((CFURLRef)exportURL, &inputFile),
				 "ExtAudioFileOpenURL failed");
	
	// prepare to convert to a plain ol' PCM format
	AudioStreamBasicDescription myPCMFormat;
	myPCMFormat.mSampleRate = 44100; // todo: or use source rate?
	myPCMFormat.mFormatID = kAudioFormatLinearPCM ;
	myPCMFormat.mFormatFlags =  kAudioFormatFlagsCanonical;	
	myPCMFormat.mChannelsPerFrame = 2;
	myPCMFormat.mFramesPerPacket = 1;
	myPCMFormat.mBitsPerChannel = 16;
	myPCMFormat.mBytesPerPacket = 4;
	myPCMFormat.mBytesPerFrame = 4;
	
	CheckResult (ExtAudioFileSetProperty(inputFile,
			kExtAudioFileProperty_ClientDataFormat,
			sizeof (myPCMFormat), &myPCMFormat),
		  "ExtAudioFileSetProperty failed");

	// allocate a big buffer. size can be arbitrary for ExtAudioFile.
	// you have 64 KB to spare, right?
	UInt32 outputBufferSize = 0x10000;
	void* ioBuf = malloc (outputBufferSize);
	UInt32 sizePerPacket = myPCMFormat.mBytesPerPacket;	
	UInt32 packetsPerBuffer = outputBufferSize / sizePerPacket;
	
	// set up output file
	NSString *outputPath = [myDocumentsDirectory() 
			stringByAppendingPathComponent:@"export-pcm.caf"];
	NSURL *outputURL = [NSURL fileURLWithPath:outputPath];
	NSLog (@"creating output file %@", outputURL);
	AudioFileID outputFile;
	CheckResult(AudioFileCreateWithURL((CFURLRef)outputURL,
		   kAudioFileCAFType,
		   &myPCMFormat, 
		   kAudioFileFlags_EraseFile, 
		   &outputFile),
		  "AudioFileCreateWithURL failed");
	
	// start convertin'
	UInt32 outputFilePacketPosition = 0; //in bytes
	
	while (true) {
		// wrap the destination buffer in an AudioBufferList
		AudioBufferList convertedData;
		convertedData.mNumberBuffers = 1;
		convertedData.mBuffers[0].mNumberChannels = myPCMFormat.mChannelsPerFrame;
		convertedData.mBuffers[0].mDataByteSize = outputBufferSize;
		convertedData.mBuffers[0].mData = ioBuf;

		UInt32 frameCount = packetsPerBuffer;

		// read from the extaudiofile
		CheckResult (ExtAudioFileRead(inputFile,
			  &frameCount,
			  &convertedData),
			 "Couldn't read from input file");
		
		if (frameCount == 0) {
			printf ("done reading from file");
			break;
		}
		
		// write the converted data to the output file
		CheckResult (AudioFileWritePackets(outputFile,
			   false,
			   frameCount,
			   NULL,
			   outputFilePacketPosition / myPCMFormat.mBytesPerPacket, 
			   &frameCount,
			   convertedData.mBuffers[0].mData),
			 "Couldn't write packets to file");
		
		NSLog (@"Converted %ld bytes", outputFilePacketPosition);

		// advance the output file write location
		outputFilePacketPosition +=
			(frameCount * myPCMFormat.mBytesPerPacket);
	}
	
	// clean up
	ExtAudioFileDispose(inputFile);
	AudioFileClose(outputFile);

	// GUI update omitted
}

Note that this uses a CheckResult() convenience function that Kevin Avila wrote for our upcoming Core Audio book… it just looks to see if the return value is noErr and tries to convert it to a readable four-char-code if it seems amenable. It’s in the example file too.

Is It Soup Yet?

Does all this work? Rather than inspecting the AudioStreamBasicDescription of the resulting file, let’s do something more concrete. With Xcode’s “Organizer”, you can access your app’s sandbox on the device. So we can just drag the Application Data to the Desktop.

In the resulting folder, open the Documents folder to find export-pcm.caf. Drag it to QuickTime Player to verify that you do, indeed, have PCM data:

So there you have it. In several hundred lines of code, we’re able to get a song from the iPod Music Library, export it into our app’s Documents directory, and convert it to PCM. With the raw samples, you could now draw an audio waveform view (something you’d think would be essential for video editors who want to match video to beats in the music, but Apple seems dead-set against letting us do do with AV Foundation or QTKit), you could perform analysis or effects on the audio, you could bring it into a Core Audio AUGraph and mix it with other sources… all sorts of possibilities open up.

Clearly, it could be a lot easier. It’s a ton of code, and two file exports (library to .m4a, and .m4a to .caf), when some apps might be perfectly happy to read from the source URL itself and never write to the filesystem… if only they could. Having spent the morning writing this blog, I may well spend the afternoon filing feature requests on bugreport.apple.com. I’ll update this blog with OpenRadar numbers for the following requests:

  • Allow Core Audio to open URLs provided by MediaLibrary’s MPMediaItemPropertyAssetURL
  • AV Foundation should allow passthrough export of Media Library items
  • AV Foundation export needs finer-grained control than just presets
  • Provide sample-level access for AVAsset

Still, while I’m bitching and whining, it is remarkable that iOS 4 opens up non-DRM’ed items in the iPod library for export. I never thought that would happen. Furthermore, the breadth and depth of the iOS media APIs remain astonishing. Sometimes terrifying, perhaps, but compared to the facile and trite media APIs that the other guys and girls get, we’re light-years ahead on iOS.

Have fun with this stuff!

Update: This got easier in iOS 4.1. Please forget everything you’ve read here and go read From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary instead.

Video Editing with Haddocks

News.com.com.com.com, on evidence of a new Apple video format in iMovie 8.0.5

Dubbed iFrame, the new video format is based on industry standard technologies like H.264 video and AAC audio. As expected with H.264, iFrame produces much smaller file sizes than traditional video formats, while maintaining its high-quality video. Of course, the smaller file size increases import speed and helps with editing video files.

Saying smaller files are easier to edit is like saying cutting down the mightiest tree in the forest is easier with a haddock than with a chainsaw, as the former is lighter to hold.

The real flaw with this is that H.264, while a lovely end-user distribution format, uses heavy temporal compression, potentially employing both P-frames (“predicted” frames, meaning they require data from multiple earlier frames), and B-frames (“bidirectionally predicted” frames, meaning they require data from both earlier and subsequent frames). Scrubbing frame-by-frame through H.264 is therefore slowed by sometimes having to read in and decompress multiple frames of data in order to render the next one. And in my Final Cut experience, scrubbing backwards through H.264 is particularly slow; shuttle a few frames backwards and you literally have to let go of the wheel for a few seconds to let the computer catch up. For editing, you see a huge difference when you use a format with only I-frames (“intra” frames, meaning every frame has all the data it needs), such as M-JPEG or Pixlet.

You can use H.264 in an all-I-frame mode (which makes it more or less M-JPEG), but then you’re not getting small file-sizes meant for end-user distribution. I’ll bet that iFrame employs H.264 P- and B-frames, being aimed at the non-pro user whose editing consists of just a handful of cuts, and won’t mind the disk grinding as they identify the frame to cut on.

But for more sophisticated editing, having your source in H.264 would be painful.

This also speaks to a larger point of Apple seemingly turning its back on advanced media creatives in favor of everyday users with simpler needs. I’ve been surprised at CocoaHeads meetings to hear that I’m not the only one who bemoans the massive loss of functionality from the old 32-bit C-based QuickTime API to the easier-to-use but severely limited QTKit. That said, everyone else expects that we’ll see non-trivial editing APIs in QTKit eventually. I hope they’re right, but everything I see from Apple, including iFrame’s apparent use of H.264 as a capture-time and therefore edit-time format, makes me think otherwise.