AV Foundation and the void

Yesterday I streamed some WWDC sessions while driving to meet with a client. At a stop, I posted this pissy little tweet:

It got enough quizzical replies (and a couple favorites), I figured I should elaborate as best I can, while staying away from all things NDA.

Part of what I’m reacting to comes from a habit of mine of deliberately seeking the unseen, which I picked up either from Musashi’s Book of Five Rings, or Bastiat’s essay Ce qu’on voit et ce qu’on ne voit pas (“What is Seen and What is Unseen”), because of course with me it’s going to either be samurai or economics, right? Anyways, the idea is to seek truth not in what you encounter, but what is obvious by its absence. It’s something I try to do when editing: don’t focus only on what’s there in the document, also figure out if anything should be there, and isn’t.

And when I look at AV Foundation on iOS and especially on OS X, I feel like there are a lot of things missing.

Rip, Divvy, Watch

Let me explain this by means of a use-case. As y’all know, I watch a lot of anime. The new stuff I stream, often via Crunchyroll, but old and favorite stuff, I collect on DVD. Still, since I mostly watch on iPad, literally the first thing I do with a new DVD from Right Stuf is to rip it with Handbrake.

Some shows divvy up their episodes one-per-title on the DVD, so once I rip each of those titles, they’re ready to import into iTunes and tag. But other shows put all the episodes in one title, which means I need to do a trivial edit to split them out into individual episode files. For that, I can just use QuickTime Player 7 Pro.

Cutting up Madoka Magica DVD 3 into episodes with QuickTime Player 7 Pro

The workflow is monotonous, but trivial. Open the source movie, then:

  • Set the in- and out-points for an episode
  • Copy
  • Create a new (empty) movie document
  • Paste into the new movie
  • Export to .mp4, with audio and video tracks set to “pass through” (since I already ripped with iPad-compatible codecs)

(In the above screenshot, I’ve set in- and out-points for the last episode. The position of the playhead is irrelevant, but it makes a better screenshot than a black frame.)

So here’s a screenshot of the same frame in the AV Foundation-based QuickTime Player X:

Trying to edit rip of Madoka Magica in QuickTime Player X, only to discover it has virtually no meaningful editing abilities.

See any major differences between those screenshots? A lot of cosmetic differences, obviously, but the important one is the handles. They don’t exist in QuickTime Player X.

In fact, every step of my workflow doesn’t exist in QuickTime Player X:

  • There are no selection handles, except when trimming
  • “Copy” only copies a single video frame (and no audio) from the current playhead position
  • The “New (empty) movie” menu item doesn’t exist; the only “new movie” functionality is capture-related
  • “Paste” exists as a menu item, but I’m not sure when/if it is ever enabled; certainly not in this workflow
  • “Export” offers only a tiny number of canned presets, none of which are “pass-through” for media already encoded with the desired codecs and settings

This isn’t to say that QuickTime Player X is a deliberately crippled application. It is, instead, a direct reflection of AV Foundation and its values.

Lost in the Move

As I compose my thoughts about this blog, I don’t mean for to turn into a bitch session, or a nostalgia trip for QuickTime. Instead, I feel like AV Foundation has these enormous gaps in its functionality, relative to its direct QuickTime antecedent, that are so obvious that I feel like I must be missing some crucial insight into AVF instead.

But let’s start with the QuickTime Movie, the struct at the heart of the two-decade old framework. It’s a fiendishly clever thing, in that it organizes the relationships (spatial, temporal, volume, etc.) between various tracks of media, and does so in a very loosely-coupled way. A Movie in memory is largely a collection of references to media in different places: this part of this sound file, that part of that video, etc. It doesn’t need to copy samples from their source locations, and generally puts off doing so until you absolutely insist. This is why you could always save your QuickTime stuff as a “reference movie”, a collection of pointers to other files, that would often only be a few KB, and that’s great when you’re editing. Or, you can pull all the referenced video into one movie file, a process called “flattening”. Note, however, that your flattened tracks could use multiple codecs and bitrates… potentially useful in editing, but bad for end-user distribution, which is why you’d ultimately want to export, which re-encodes into a single codec for each track.

Oh, did I mention that the in-memory representation of a Movie could be saved to disk and would in fact be a working reference movie? Yeah, QuickTime is way clever.

So with QuickTime, there’s this idea that movies are made of references to other files, and you can save them in this state. Keep that in mind.

QuickTime is a C API that didn’t get converted to 64-bit (too many dependencies on legacies like QuickDraw), so an Objective-C wrapper was lashed together as QTKit, which I suppose we can note was preceded by an earlier OO wrapper, QuickTime for Java, the topic of my first book. Most of the functionality of the Movie lives on in the QTMovie class, and you could always call its quickTimeMovie method to get the C structure if you needed something in the C API.

In AV Foundation, the QTMovie‘s closest replacement is a hierarchy of classes that starts at AVAsset, representing time-based media organized as tracks. For editing, we are interested in the subclasses AVComposition and AVMutableComposition. The OO design makes for cleaner code: if you’re just playing media, you work at the AVAsset level, and if you’re editing, you build up a AVMutableComposition.

So let’s say we’re happily editing away, and our user wants to save their work. Cool, we’ll just save that AVMutableComposition to disk, and just like with QuickTime’s -[QTMovie updateMovieFile], we’ll quickly and cheaply write that map of media references to disk. So where’s the equivalent method on AVAsset?

Oh, there doesn’t seem to be any sort of save method in the AVMutableComposition documentation, or its superclasses. Weird.

OK, maybe we use some more modern Cocoa technique, like NSKeyedArchiver. No, that won’t work, because AVAsset and its subclasses don’t implement NSCoding.

This surprised me, so I asked a colleague who talks about AV Foundation at conferences if I was mistaken, and he agreed that there doesn’t seem to be a way to save an AVMutableComposition. Which in turn means that if you’re writing a video editor, the composition object is only useful to you during the life of the app, and you need some other means of saving the set of edits you’ve made, and re-creating a new composition from this data on a future launch.

Reference movie versus flattening? Apparently, the choice in AVF is neither (or DIY).

More Missing Menu Magic

Assuming we have (or don’t need) a means of saving and re-creating our compositions, there are a few surprisingly big holes that affect our use of a composition at runtime. The QuickTime C API (but not QTKit, apparently?) includes functions to cut, copy, and paste at the movie level, which goes across all tracks and/or creates tracks as needed. It also has APIs to insert segments from other sources, either at the movie or track level.

AV Foundation has an equivalent for segment-level editing — which indeed is the most powerful and important editing capability — but no access to cut/copy/paste. By creating your own selection UI, it’s presumably possible to do this, though I haven’t thought it all the way through (assuming copy is synchronous and probably called on the main thread, could fetching the selected media and exposing it as an NSData with UTI block on some decoding or disk I/O?)

Another feature your users will expect is undo and redo. In QuickTime, NewMovieEditState() will return an object that represents the edited movie at this time. You can put that on an undo stack, and then just pop back to it with UseMovieEditState(). Easy peasy.

I don’t see any explicit support for undo and redo in AV Foundation. The one thing I would be tempted to try is that since AVMutableComposition implements NSCopying (mutable copying, in fact), it should be possible to make a deep copy of your composition. So maybe your editing app has a larger “document” class that has the currentComposition as a property, and that your destructive edits send [currentComposition copy] over to the NSUndoManager, while a redo receives an old composition and sets that as the document’s currentComposition property. Maybe this would work, but I’d want to watch it in Instruments to see just how much memory each of these copies consumes — if it’s just media pointers, it would be nice and cheap, but if it copies samples, I’m screwed.

At this point, I’m starting to realize that many of the essential menu items of QuickTime Player 7 Pro — New, Save, Save As, Cut, Copy, Paste, Undo, and Redo — would be a real pain in the butt to implement in AV Foundation. No wonder so many of them are gone in QuickTime Player X.

Codec Clamp-Down

Another distinguishing trait of QuickTime is its ability to add codecs, both from Apple and third parties. Arming yourself with Flip4Mac to handle Windows Media, and Perian for all the weird crap supported by FFmpeg, gives your Mac a fighting chance of handling almost any weird format you happen to come across, playing it in QuickTime Player or iTunes, or transcoding it to something your iPad can handle.

QuickTime is meant to be extended in this way, AV Foundation apparently isn’t. So you’re all set if you stick to the playground of Apple’s preferred codecs — ProRes for editing, H.264 for end-user distribution, etc. — but you’re kind of out of luck when you get an .mkv that uses some esoteric codec. To date, lots of us have depended on QuickTime extensions to transcode to something sane, but someday that won’t be possible anymore. If AV Foundation remains closed to third-party codecs, then what? In the past when I’ve found something that even QuickTime + Perian can’t handle, I’ve turned to VLC, which always ends in tears because VLC sucks at transcoding. Really sucks. Fourth season of Battlestar Galactica suck. At least they admit as much nowadays:

VLC alert sheet acknowledging that it sucks at transcoding

Then again, I keep forgetting that Handbrake is a general purpose transcoder, and not just a DVD ripper. At some point in the future, when I get a video in some insane format used in some far-flung corner of the world, I guess it’s Handbrake… or Windows.

Sleeves Up

So here’s the thing: I’m actually starting an AV Foundation project next week, with a dive into Core Video for good measure (something I’ve needed and wanted to get deeper into for a long time). So that’s why I’ve got AVF on the mind again.

But I also have a list on my whiteboard of fun projects I’m saving up to do as livestream code-alongs someday, and one is an AV Foundation based episode-splitter that would replace my cut-copy-paste routine from way above. Because really, it would be pretty simple to write an app that just lets me razor-slice the big file at each episode break, and then mass export them into separate files using some programmatic file-numbering system. Instead of select / copy / new / paste / export for each episode, I’d basically find four cut points and click “go”. Way easier to use, and utterly straightforward in AV Foundation (and, luckily, AVAssetExportPresetPassthrough exists as an option for the export, so I wouldn’t suffer a re-encode).

Still, this leaves me thinking of a final critical difference between QuickTime and AV Foundation. With AVF, I solve this problem as a developer. With QuickTime, there was a lot you could achieve just with authoring. I’ve often advised people to look first for an authoring solution before resorting to code. For example, to burn an image on top of video, many developers would look for ways to hack into the render pipeline and draw a graphic at just the right time, but the elegant QuickTime solution is to just add the overlay as another video track, with just one sample frame stretched out over the desired duration, at a higher z-order. QuickTime’s SMIL support also allowed for movies to be created from XML files describing tracks and their media references, so this could totally be done with text editors or script files. Throw in the AppleScript-ability of QuickTime Player 7 Pro, and you could then export these files to “burn in” the graphic permanently. And similarly, if my episode breaks were always in the same place, the AVF-based episode-splitter-and-exporter I described in the previous paragraph could probably be written as an AppleScript, no Xcode needed.

The AV Foundation world is a different world than the one that gave us QuickTime. I keep hoping it will prove to be a better world in the long run, but I don’t think we’re there yet.

Comments (3)

  1. Thank you very much for this post! Since the middle 80′ we preferred Mac OS because of the incredible features of QuickTime.
    Since Mac OS X 10.7 we are working on the “transfer” from QuickTime to AVFoundation. And we would like to agree – there a major bad news.

    Apple has stated now that they expect people to convert all media to be compatible to AVFoundation (aka MPEG4, ProRes, H264…). On iOS this was understandable and may be helpful, but for “normal” computers this is… well… ignorant? There are and will be many users who will create other video formats – and it was (in the past!) always a good hint to say: having a Mac means being compatible to everything. Thanks to QuickTime extensions like Flip4Mac and Perian! This is the past.

    AVFoundation is very much different to QuickTime: for example it is impossible to save a modified video without re-de-encoding a video. And only “modern” encodings will be supported and no “format” extensions are planned for the future.

    Because AVMutableCompositionTrack can not be inserted to a AVMutableComposition, many features will not be possible without exporting / saving (decoding and encoding again) the video and getting a new AVAsset and AVAssetTrack. And none of these classes do or will support clipboard features (cut, copy, paste).

    With Mac OS X 10.9, Apple has given us a preview of their interests: there is a new class called AVPlayerView and its only new editing feature is… trimming videos! This is mass market strategy: just think of people recording a short video, trimming it a little bit and sending it to a provider (like YouTube, Vimeo, Flickr).

    That’s it. Forget the good all times. Forget all your old videos (if you prefered QuickTime). Forget QuickTime VR (panorama videos). And may be: move to cheaper operation systems and hardware, because one major reason to spend money is gone. Welcome Linux, VLC and other open source projects!

    Anyway and beside all frustrations: we will keep on working for a simple 64 bit media (editing) application which will support simple features (like deleting frames in the middle of a video instead of just trimming videos)!

  2. […] a while back, you might remember me bitching about AV Foundation and presenting as my use-case for where the AVF-based QuickTime Player X comes up lacking, the […]

  3. […] roll back two years and you can see me bitching about AV Foundation’s editing APIs. In AV Foundation, there’s a movie-in-memory concept called an AVComposition which can be […]

Leave a Reply to [Time code]; Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.