Archives for : April2011

Wrap up from Voices That Matter iPhone, Spring 2011

Ugh, this is twice in a row that I’ve done a talk for the Voices That Matter: iPhone Developer Conference and been able to neither get all my demos working perfectly in time, nor to cover all the important material in 75 minutes. Yeah, doing a 300-level talk will do that to you, but still…

This weekend’s talk in Seattle was “Advanced Media Manipulation with AV Foundation”, sort of a sequel to the intro talk I did at VTM:i Fall 2010 (Philly), but since the only people who would have been at both conferences are speakers and organizers, I spent about 25 minutes recapping material from the first talk: AVAssets, the AVPlayer and AVPlayerLayer, AVCaptureSession, etc.

Aside: AVPlayerLayer brings up an interesting point, given that it is a subclass of CALayer rather than UIView, which is what’s provided by the view property of the MPMoviePlayerController. What’s the big difference between a CALayer and a UIView, and why does it matter for video players? The difference is that UIView subclasses UIResponder and therefore responds to touch events (the one in the Media Player framework has its own pop-up controls after all), whereas a CALayer, and AVPlayerLayer, does not respond to touch input itself… it’s purely visual.

So anyways, on to the new stuff. What has interested me for a while in AV Foundation is the classes added in 4.1 to do sample level access, AVAssetWriter and AVAssetReader. An earlier blog entry, From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary, exercises both of these, reading from an iPod Library song with an AVAssetReader and writing to a .caf file with an .

Before showing that, I did a new example, VTM_ScreenRecorderTest, which uses AVAssetWriter to make an iOS screen recorder for your application. Basically, it runs an onscreen clock (so that something onscreen is changing), and then uses an NSTimer to periodically do a screenshot and then write that image as a video sample to the single video track of a QuickTime .mov file. The screenshot code is copied directly from Apple’s Technical Q&A 1703, and the conversion from the resulting UIImage to the CMSampleBufferRef needed for writing raw samples is greatly simplified with the AVAssetWriterInputPixelBufferAdaptor.

In the Fall in Philly, I showed a cuts-only movie editor that just inserted segments up at the AVMutableComposition level. For this talk, I wanted to do multiple video tracks, with transitions between them and titles. I sketched out a very elaborate demo project, VTM_AVEffects, which was meant to perform the simple effects I used for the Running Start (.m4v download) movie that I often use an example. In other words, I needed to overlay titles and do some dissolves.

About 10 hours into coding my example, I realized I was not going to finish this demo, and settled for getting the title and the first dissolve. So if you’re going to download the code, please keep in mind that this is badly incomplete code (the massive runs of commented-out misadventures should make that clear), and it is neither production-quality, nor copy-and-paste quality. And it most certainly has memory leaks and other unresolved bugs. Oh, and all the switches and text fields? They do nothing. The only things that work are tapping “perform” and then “play” (or the subsequent “pause”). Scrubbing the slider and setting the rate field mostly work, but have bugs, particularly in the range late in the movie where there are no valid video segments, but the :30 background music is still valid.

Still, I showed it and will link to it at the end of this blog because there is some interesting working code worth discussing. Let’s start with the dissolve between the first two shots. You’ll notice in the code that I go with Apple’s recommendation of working back and forth between two tracks (“A” and “B”, because I learned on analog equipment and always think of it as A/B Roll editing). The hard part — and by hard, I mean frustrating, soul-draining, why-the-frack-isn’t-this-goddamn-thing-working hard — is providing the instructions that describe how the tracks are to be composited together. In AV Foundation, you provide an AVVideoComposition that describes the compositing of every region of interest in your movie (oh, I’m sorry, in your AVComposition… which is in no way related to the AVVideoComposition). The AVVideoComposition has an array of AVVideoCompositionInstructions, each covering a specific timeRange, and each containing its own AVVideoCompositionLayerInstruction to describe the opacity and affine transform (static or animated) of each video track. Describing it like that, I probably should have included a diagram… maybe I’ll whip one up in OmniGraffle and post it later. Anyways, this is fairly difficult to get right, as your various instructions need to account for all time ranges across all tracks, with no gaps or overlaps, and timing up identically with the duration of the AVComposition. Like I said, I got exactly one fade-in working before I had to go pencils-down on the demo code and start preparing slides. Maybe I’ll be able to fix it later… but don’t hold me to that, OK?

The other effect I knew I had to show off was titles. AVFoundation has a curious way to do this. Rather than add your titles and other overlays as new video tracks, as you’d do in QuickTime, AVF ties into Core Animation and has you do your image magic there. By using an AVSynchronizedLayer, you can create sublayers whose animations get their timing from the movie, rather than from the system clock. It’s an interesting idea, given how powerful Quartz and Core Animation are. But it’s also deeply weird to be creating content for your movie that is not actually part of the movie, but is rather just loosely coupled to the player object by way of the AVPlayerItem (and this leads to some ugliness when you want to export the movie and include the animations in the export). I also noticed that when I scrubbed past the fade-out of the title and then set the movie playback rate to a negative number to run it backward, the title did not fade back in as expected… which makes me wonder if there are assumptions in UIKit or Core Animation that time always runs forward, which is of course not true when AV Foundation controls animation time, via the AVSynchronizedLayer

My code is badly incomplete and buggy, and anyone interested in a solid demo of AV Foundation editing would do well to check out the AVEditDemo from Apple’s WWDC 2010 sample code. Still, I said I would post what I’ve got, so there you go. No complaints from you people, or the next sample code you get from me will be another goddamned webapp server written in Java.

Oh yeah, at one point, I dreamed of having enough time to write a demo that would process A/V capture data in real-time, using AVCaptureSessionDataOutput, maybe showing a live audio waveform or doing an FFT. But that demo didn’t even get written. Maybe next conference.

For a speaker on “advanced” AV Foundation, I find I still have a lot of unanswered questions about this framework. I’m not sure how well it supports saving an AVComposition that you’re editing — even if the AVF classes implement NSCopying / NSMutableCopying and could therefore be persisted with key-value archiving, that doesn’t address how you’d persist your associated Core Audio animations. I also would have to think hard about how to make edits undoable and redoable… I miss QuickTime’s MovieEditState already. And to roll an edit… dig into a track’s segments and dick with their timeRanges, or do you have to remove and reinsert the segment?

And what else can I do with CASynchronizedLayer? I don’t see particularly compelling transitions in AVF — just dissolves and trivial push wipes (ie, animation of the affine transform) — but if I could render whatever I like in a CALayer and pick up the timing from the synchronized layer, is that how I roll my own Quartz-powered goodness? Speaker Cathy Shive and I were wondering about this idea over lunch, trying to figure out if we would subclass CAAnimation or CALayer in hopes of getting a callback along the lines of “draw your layer for time t“, which would be awesome if only either of us were enough of a Core Animation expert to pull it off.

So, I feel like there’s a lot more for me to learn on this, which is scary because some people think I’m an expert on the topic… for my money, the experts are the people in the AV Foundation dev forums (audio, video), since they’re the ones really using it in production and providing feedback to Apple. Fortunately, these forums get a lot of attention from Apple’s engineers, particularly bford, so that sets a lot of people straight about their misconceptions. I think it’s going to be a long learning curve for all of us.

If you’re keen to start, here are the slides and demo code: