I’m speaking at three of the five CocoaConfs for early 2014, teaching an all-day AV Foundation Film School class and a regular session on Stupid Video Tricks, which is also all about AV Foundation. (In DC, I also reprised Get on the Audiobus to fill in for another speaker).

UPDATE: I’m also going to do “Stupid Video Tricks” at next week’s Ann Arbor CocoaHeads.

I first taught the class in Chicago, and then added one more project for DC and San Jose based on how the timing worked out. To speed things up, I created starter projects that dealt with all the storyboard connections and drudge-work, leaving big holes in the code that say // TODO: WRITE IN CLASS for the stuff we do as a code-along. The class projects are:

  1. Play back a video file from a URL
  2. Capture into a video file (and play back in another tab, with the code from 1)
  3. Edit together clips and export as a new .m4v file, first as a cuts-only edit (easy), and then with cross-dissolved (quite painful and clearly marked as an hour of outright drudgery)
  4. Processing video frames at capture-time with Core Image

The last of these is straight from the regular talk, “Stupid Video Tricks”, which hadn’t really come together in time for CocoaConf Chicago (I was pulled between getting the class ready and client work), but is in good shape now that I’ve had time to work on it more. If anyone from Chicago wants to see what you missed — specifically the Core Image stuff where I was showing off the Mac project that I didn’t manage to get ported over to iOS in time — check out the slides, which have a link to the sample code:

  • Stupid Video Tricks: slides, code (ZIP archive, 27 MB)

Applying the CIPixellate filter to capture frames

Basically, the problem that killed me the night before the Chicago presentation was when I was trying to port my OS X AV Foundation / Core Image code to iOS and I came across this little deal-killer in -[CIContext drawImage:inRect:fromRect:]:

On iOS, this method draws the CIImage object into a renderbuffer for the OpenGL ES context. Use this method only if the CIContext object is created with contextWithEAGLContext: and if you are rendering to a CAEAGLayer.

Since my knowledge of Open GL can charitably be described as “jack squat”, I had been doing everything on the CPU on OS X (creating my CIContext from an NSBitmapImageRep, since I needed raw access to the pixels anyways so I could then put them into a CVPixelBuffer and ultimately write them to a movie file with AVAssetWriter). Fortunately, the WWDC 2013 “Core Image Fun House for iOS” sample got me unblocked on setting up a suitable OpenGL ES context for this example.

Still, I’m now taking some time to start working through the iPhone 3D Programming book I won a few years ago, because my OpenGL ignorance is a liability I can no longer afford as I get into the deep, dark parts of AV Foundation. To wit, I’d like to know if I can create an offscreen OpenGL ES context at an exact size, since an app that’s going to write effected frames to a .mov file is going to want to work with exact frame sizes, and not whatever bounds a GLKView ends up with after the phone is rotated or an NSOpenGLView has when its window is resized. Jonathan Blocksom of the Big Nerd Ranch assured me in DC that this is no sweat, and yeah, he would know.

There are two demos of real-time capture processing in the talk. The first (and the one I added to the class) uses a single CIPixellate filter to provide the cute pixellation effect seen in the screenshot a few paragraphs up. A second one combines CIColorCube and CISourceOverCompositing to create a chroma key effect from a green screen capture source.

Unprocessed green screen image

Chroma key result (via Reflector)

The other Stupid Video Trick of note has to do with directly accessing samples from a source movie via AVAssetReader. WWDC 2013 had a talk about writing subtitle tracks programmatically, which is ironic since WWDC talks haven’t had subtitle/caption tracks since 2010 (which is also a shame). My demo goes the other direction, taking the 2010 WWDC session on AV Foundation, finding its subtitle track, reading the samples one-by-one, and pulling out the text. The resulting app shows the captions in a UILabel under the video, and has a modal list that lets you scroll through all the captions and jump to one (and yes, this should be searchable).

Screenshot of subtitle-reader demo

Key point to make here is that while Core Media doesn’t help you pull out the data as easily as it does for video frames (see CMSampleBufferGetImageBuffer()), it’s pretty straightforward to get a CMBlockBuffer, get a void* to its data, and then parse that data in accordance with the QuickTime File Format Specification (look for “Subtitle Sample Data”, and keep in mind that I’m foolish for directly linking to Apple developer documentation, because they move their URLs every 6-12 months, so by September you can expect that link to be broken).

At some point, I should probably go back and do a new “brain dump” on AV Foundation, particularly the mess of classes that’s used to do editing. It doesn’t help that they’re all named as if the API designers had those refrigerator word magnets, and chose all the class names by re-combining [AV] [Mutable] [Video] [Composition] [Layer] and [Instruction], meaning that an AVMutableComposition and an AVMutableVideoComposition are almost entirely unrelated. It’s also a bitch to keep straight just who owns whom. I tried to write the following from memory and utterly failed until I looked up in the docs that:

  • An AVComposition that you’re building may have multiple video tracks. You don’t create relationships between them in the composition. Rather, you tell an AVPlayer, AVAssetExportSession, or AVAssetWriter that you have an:
  • AVVideoComposition to describe the video compositing of the tracks within the source composition. The video composition, not to be confused with the composition (really!), has an array of…
  • AVVideoCompositionInstructions, which provide timing information, and among the array members must properly account for every moment of time within the composition. Each composition instruction has an array of…
  • AVVideoCompositionLayerInstructions, which indicate the crop, affine transform, and opacity during the time range (possibly ramping between multiple values during that range). The array’s order implicitly specifies the z-axis layering of the layers, with index 0 on top, 1 below it, and so on.

Easy peasy, right? Heh, right. Maybe there’s an analogy here to the video composition (the thing built of instructions all the way down) modifying how a player or exporter renders the composition, in the same way a CSS stylesheet overlays styling atop the contents of a DOM. It doesn’t change the DOM; it tells a browser how to render its contents. Still, good luck to Bob McCune when he gets to explain all this in his AV Foundation book, which is destined to be the go-to guide on this stuff once it’s done.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.