Wrap up from Voices That Matter iPhone, Spring 2011

Ugh, this is twice in a row that I’ve done a talk for the Voices That Matter: iPhone Developer Conference and been able to neither get all my demos working perfectly in time, nor to cover all the important material in 75 minutes. Yeah, doing a 300-level talk will do that to you, but still…

This weekend’s talk in Seattle was “Advanced Media Manipulation with AV Foundation”, sort of a sequel to the intro talk I did at VTM:i Fall 2010 (Philly), but since the only people who would have been at both conferences are speakers and organizers, I spent about 25 minutes recapping material from the first talk: AVAssets, the AVPlayer and AVPlayerLayer, AVCaptureSession, etc.

Aside: AVPlayerLayer brings up an interesting point, given that it is a subclass of CALayer rather than UIView, which is what’s provided by the view property of the MPMoviePlayerController. What’s the big difference between a CALayer and a UIView, and why does it matter for video players? The difference is that UIView subclasses UIResponder and therefore responds to touch events (the one in the Media Player framework has its own pop-up controls after all), whereas a CALayer, and AVPlayerLayer, does not respond to touch input itself… it’s purely visual.

So anyways, on to the new stuff. What has interested me for a while in AV Foundation is the classes added in 4.1 to do sample level access, AVAssetWriter and AVAssetReader. An earlier blog entry, From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary, exercises both of these, reading from an iPod Library song with an AVAssetReader and writing to a .caf file with an .

Before showing that, I did a new example, VTM_ScreenRecorderTest, which uses AVAssetWriter to make an iOS screen recorder for your application. Basically, it runs an onscreen clock (so that something onscreen is changing), and then uses an NSTimer to periodically do a screenshot and then write that image as a video sample to the single video track of a QuickTime .mov file. The screenshot code is copied directly from Apple’s Technical Q&A 1703, and the conversion from the resulting UIImage to the CMSampleBufferRef needed for writing raw samples is greatly simplified with the AVAssetWriterInputPixelBufferAdaptor.

In the Fall in Philly, I showed a cuts-only movie editor that just inserted segments up at the AVMutableComposition level. For this talk, I wanted to do multiple video tracks, with transitions between them and titles. I sketched out a very elaborate demo project, VTM_AVEffects, which was meant to perform the simple effects I used for the Running Start (.m4v download) movie that I often use an example. In other words, I needed to overlay titles and do some dissolves.

About 10 hours into coding my example, I realized I was not going to finish this demo, and settled for getting the title and the first dissolve. So if you’re going to download the code, please keep in mind that this is badly incomplete code (the massive runs of commented-out misadventures should make that clear), and it is neither production-quality, nor copy-and-paste quality. And it most certainly has memory leaks and other unresolved bugs. Oh, and all the switches and text fields? They do nothing. The only things that work are tapping “perform” and then “play” (or the subsequent “pause”). Scrubbing the slider and setting the rate field mostly work, but have bugs, particularly in the range late in the movie where there are no valid video segments, but the :30 background music is still valid.

Still, I showed it and will link to it at the end of this blog because there is some interesting working code worth discussing. Let’s start with the dissolve between the first two shots. You’ll notice in the code that I go with Apple’s recommendation of working back and forth between two tracks (“A” and “B”, because I learned on analog equipment and always think of it as A/B Roll editing). The hard part — and by hard, I mean frustrating, soul-draining, why-the-frack-isn’t-this-goddamn-thing-working hard — is providing the instructions that describe how the tracks are to be composited together. In AV Foundation, you provide an AVVideoComposition that describes the compositing of every region of interest in your movie (oh, I’m sorry, in your AVComposition… which is in no way related to the AVVideoComposition). The AVVideoComposition has an array of AVVideoCompositionInstructions, each covering a specific timeRange, and each containing its own AVVideoCompositionLayerInstruction to describe the opacity and affine transform (static or animated) of each video track. Describing it like that, I probably should have included a diagram… maybe I’ll whip one up in OmniGraffle and post it later. Anyways, this is fairly difficult to get right, as your various instructions need to account for all time ranges across all tracks, with no gaps or overlaps, and timing up identically with the duration of the AVComposition. Like I said, I got exactly one fade-in working before I had to go pencils-down on the demo code and start preparing slides. Maybe I’ll be able to fix it later… but don’t hold me to that, OK?

The other effect I knew I had to show off was titles. AVFoundation has a curious way to do this. Rather than add your titles and other overlays as new video tracks, as you’d do in QuickTime, AVF ties into Core Animation and has you do your image magic there. By using an AVSynchronizedLayer, you can create sublayers whose animations get their timing from the movie, rather than from the system clock. It’s an interesting idea, given how powerful Quartz and Core Animation are. But it’s also deeply weird to be creating content for your movie that is not actually part of the movie, but is rather just loosely coupled to the player object by way of the AVPlayerItem (and this leads to some ugliness when you want to export the movie and include the animations in the export). I also noticed that when I scrubbed past the fade-out of the title and then set the movie playback rate to a negative number to run it backward, the title did not fade back in as expected… which makes me wonder if there are assumptions in UIKit or Core Animation that time always runs forward, which is of course not true when AV Foundation controls animation time, via the AVSynchronizedLayer

My code is badly incomplete and buggy, and anyone interested in a solid demo of AV Foundation editing would do well to check out the AVEditDemo from Apple’s WWDC 2010 sample code. Still, I said I would post what I’ve got, so there you go. No complaints from you people, or the next sample code you get from me will be another goddamned webapp server written in Java.

Oh yeah, at one point, I dreamed of having enough time to write a demo that would process A/V capture data in real-time, using AVCaptureSessionDataOutput, maybe showing a live audio waveform or doing an FFT. But that demo didn’t even get written. Maybe next conference.

For a speaker on “advanced” AV Foundation, I find I still have a lot of unanswered questions about this framework. I’m not sure how well it supports saving an AVComposition that you’re editing — even if the AVF classes implement NSCopying / NSMutableCopying and could therefore be persisted with key-value archiving, that doesn’t address how you’d persist your associated Core Audio animations. I also would have to think hard about how to make edits undoable and redoable… I miss QuickTime’s MovieEditState already. And to roll an edit… dig into a track’s segments and dick with their timeRanges, or do you have to remove and reinsert the segment?

And what else can I do with CASynchronizedLayer? I don’t see particularly compelling transitions in AVF — just dissolves and trivial push wipes (ie, animation of the affine transform) — but if I could render whatever I like in a CALayer and pick up the timing from the synchronized layer, is that how I roll my own Quartz-powered goodness? Speaker Cathy Shive and I were wondering about this idea over lunch, trying to figure out if we would subclass CAAnimation or CALayer in hopes of getting a callback along the lines of “draw your layer for time t“, which would be awesome if only either of us were enough of a Core Animation expert to pull it off.

So, I feel like there’s a lot more for me to learn on this, which is scary because some people think I’m an expert on the topic… for my money, the experts are the people in the AV Foundation dev forums (audio, video), since they’re the ones really using it in production and providing feedback to Apple. Fortunately, these forums get a lot of attention from Apple’s engineers, particularly bford, so that sets a lot of people straight about their misconceptions. I think it’s going to be a long learning curve for all of us.

If you’re keen to start, here are the slides and demo code:

Comments (16)

  1. […] AV Foundation. Obviously, they’ll probably be a lot like what I’ve already done at Voices That Matter, except that the advanced talk I did in Seattle was about 1/3 recap, whereas this is all going to […]

  2. lucksm17

    I have a programming question about your project VTMScreenRecorderTest. I found your project to be a life saver and I really appreciate you posting your code. I need to implement a function like this in my app, only recording a video of the contents of a UIImageView instead of the whole screen. I was successful at that part, and now I am struggling with getting my application to record the microphone audio and encode it into the video. I can do this using an export session, but I just can’t figure out how to do this in real time.

    I’ve gone two days on this and am completely stumped. Help? Please?

  3. ajay123123

    hi i am using your project to capture my app screen. In my app i am running some animations with sound. This code is capturing the animations perfectly & creating the video also but the not sound that is running with animations.

    As i am new to iphone development, could you suggest me something on how to capture the animations as well as sound.

    Thanks in advance.

  4. ajay123123: are you responsible for the audio in your app? Theoretically, you could just add a second AVAssetWriterInput to periodically write buffers of sound samples to the AVAssetWriter.

    How practical that really is depends on how you’re processing the sound. If you’re already processing buffers of samples and pushing them through something like an Audio Queue, then this might be pretty straightforward. But if you’re not already responsible for processing audio buffers – maybe you’re playing from a file with AVAudioPlayer or from the music library with MPMusicPlayerController – then you’re going to have to have a completely different audio path just to get you to the point where you have buffers of audio, and therefore something to hand to your second AVAssetWriterInput

    If all these APIs are bewildering you, reset by reading through Apple’s Multimedia Programming Guide for iOS.

    In case you were wondering… no, there isn’t a convenient “grab whatever audio the system is playing” API on iOS. In fact, there really isn’t one on the Mac either: stuff like SoundFlower / Audio Hijack sets itself up as a virtual audio I/O device that copies the data it’s sent and then sends it on to real audio hardware. This can’t be done on iOS because third-party apps can’t add new I/O devices (also, iOS’ model of one-and-only-one audio device makes Core Audio much easier on iOS, and it’s not every day you see the word “easy” and “Core Audio” in the same sentence).

  5. ajay123123

    Hi cadamson ,

    I am working on the app which is very much similar to Talking Tom. There i am playing different sound file on the basis of animations & i am using AVAudioPlayer to do that but still not getting any idea of how sort it out.

  6. ajay123123

    Hi cadamson,

    After creating the video when i am trying to save it in photo library, its giving me the following error :

    didFinishSavingWithError: Error Domain=ALAssetsLibraryErrorDomain Code=-3302 “Invalid data” UserInfo=0x463d5a0 {NSLocalizedFailureReason=There was a problem writing this asset because the data is invalid and cannot be viewed or played., NSLocalizedRecoverySuggestion=Try with different data, NSLocalizedDescription=Invalid data}

    Could you suggest me something on how to convert the video to MP$ or MOV format before saving it to my Photo Library.

    Thanks in advance.

  7. ajay123123: Maybe the photo library will only take MP4 instead of MOV? Try changing to AVFileTypeMPEG4 when creating the AVAssetWriter.

  8. ajay123123

    Hi cadamson,

    Still getting the same error. And also when i am going to the directory of my app in my device & try to play the video, its saying that “The Movie Format is Not supported”.

    Still not able to figure out in which format do i need to save the file, i tried with both the MOV as well as MP4

  9. ajay123123: well, given that the movie works from the file system, I think your problem is specific to the assets library, something I’ve barely worked with, so we’ve reached the end of what I can offer in the way of help.

  10. zeusent

    I was wondering if you could help me with something. I am using an AVMutableComposition to put together some videos and photos. For the photos part I use the AVSynchronizedLayer wich syncs the CoreAnimation times with the AVMutableComposition time.

    My problem is that I cannot add such a layer behind the AVMutableComposition video tracks. Or better yet how I can make those tracks to have a transparent background? If I set the backgroundColor of the AVMutableVideoCompositionInstruction to [[UIColor clearColor] CGColor]; it doesn’t work.

    Any help would be much appreciated.

    Thank you!

  11. ajay123123

    Hi cadamson,

    Sry to ping you again :).

    I am able to change the format of the video that i am generating by using your code. Like i have discussed with you earlier regarding the audio capture also cause i want to capture my audio also that’s running in background.

    you suggested i need to write an avassetwriter to capture it, like its my first time when i am working with AVAsset & AVAssetWriter class so don’t have much idea, how to write it. So, one small request can you plz provide some resource where i can find, how to write assest writer for audio capture or some sample code cause wherever i am searching i am just finding how to write asset to capture video not for audio.

    Thanks in advance.. 🙂

  12. […] I found were for video capture using a camera or other input. Then I stumbled across a post by Chris Adamson on his blog. In it, he provides an example of on-screen capture from OpenGL to an image file. This […]

  13. […] last few talks have gone so deep into the woods on editing (and I’m still unsatisfied with my mess of sample code on that topic that I put together for VTM:iPhone Seattle in the Spring… maybe someday […]

  14. michelle_cat

    Just wanted to thank you for VTMscreenRecorder, I learned a lot from playing with that code.

    My primary concern is developing code to play and extract for rtsp feeds, I have several clients that are security companies i am bulding app for. Until your code I had issues with the pixelBufferAdaptor

    I would like to share demo with you, please feel free to join us on Facebook mooncatventures-group, we are getting together a pretty active community of iPhone/android developers.

    Here a little test app, that extracts frames from an rtsp feed, adds them to a pixel buffer and creates a new movie from the frames.

    I am not just promoting code here, I always have to care were a first contact looks like spam. I just really found your work useful.

    Thanks again

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.