Archives for : cocoa

More things in Heaven and Earth

There are more things in Heaven and Earth, Horatio; than are dreamt of in your philosophy.
Hamlet, Act 1, Scene V

There’s suddenly a lot of conventional wisdom that says the rise and eventual dominance of Android is manifest, and inevitable. Some of these claims make dubious analogies to Windows’ defeat of the Mac in the 90’s, ramming square pegs through round holes to make the analogy stick (to wit: who are the hardware manufacturers this time, the handset makers or the carriers). It may indeed come to pass, but the reasoning behind these claims is pretty shallow thusfar.

Case in point: an Appcelerator survey covered in The Apple Blog story Devs Say Android is Future-Proof. iOS? Not So Much. The reasoning for Android’s perceived advantage? This article doesn’t mention Android’s license terms and widespread hardware adoption (maybe that’s taken for granted at this point?), and instead mentions only the appeal of writing apps for GoogleTV, a product that is not even out yet (meaning Adamson’s First Law applies), to say nothing of how many purported “interactive television revolutions” we’ve suffered through over the decades (Qube, videotex, WebTV, Tru2Way, etc.). Maybe it’ll be the next big thing, but history argues otherwise.

In the 90’s, the rise of Java seemed an obvious bet. Applets would make web pages far more compelling than static pages and lengthy form submits, and application developers would surely be better off with garbage collection and strong typing than with C and C++. Java was so sure to be big, that Microsoft threw the full force of its dirty tricks machine at it, while Apple exposed most of the Mac’s unique libraries to Java bindings (including, at various times, QuickTime, Cocoa, Core Audio, speech, and more). But it didn’t work out that way: Java on the browser was displaced by JavaScript/Ajax, and the early attempts to write major desktop applications in Java were unmitigated disasters, with the Netscape Navigator port abandoned, and Corel’s Java version of Word Perfect Office was buried almost immediately after it was released. 1996’s sure bet was a has-been (or a never-was) by 2001.

If you think about it, the same thing happened a few years ago with AIR. With the YouTube-powered rise of Flash, AIR seemed a perfect vehicle to bring hordes of Flash developers to the desktop. Everyone knew it would be big. Except it wasn’t. AIR applications are rare today, perhaps rarer even than Java. Admittedly, I only remembered of AIR’s existence because I needed to download the AIR-powered Balsamiq application for a client this week… exception that proves the rule, I guess?

My point in all this is that the conventional wisdom about platform success has a tendency to be selective in considering what factors will make or break a platform. Licensing, corporate support, community, and of course the underlying technology all play a part. Android is greatly enhanced by the fact that Google puts talented people behind it and then gives it away, but if carriers then use it to promote their own applications and crapware over third-party apps (or cripple them, as they did with JavaME), then Android’s advantage is nil. On the other hand, Apple’s iOS may have remarkable technology, but if their model requires using their corporate strength to force carriers to be dumb pipes, then they may only be able to get iPhone on weaker carriers, which will turn off consumers and retard growth of the platform.

Ultimately, it’s hard to say how this will all play out, but assuming an Android victory based on the presumed success of currently non-existent tablets and set top boxes is surely an act of faith… which probably accounts for all the evangelism.

So why am I on iOS now? Is it because I have some reason to think that it will “win”? Not at all. Mostly it’s because I like the technology. In the mid 2000’s, when user-facing Java was in terminal decline, I tried to learn Flash and Flex to give myself more options, but I just couldn’t bring myself to like it. It just didn’t click for me. But as I got into Cocoa and then the iPhone SDK, I found I liked the design patterns, and the thoughtfulness of all of it. The elegance and power appealed to me. Being a media guy, I also appreciate the platform’s extraordinary support for audio and video: iOS 4 has three major media APIs (AV Foundation, Core Audio, and Media Player), along with other points of interest throughout the stack (video out in UIKit, the low-level abstractions of Core Media, spatialized sound in OpenAL, high-performance DSP functions in the Accelerate framework, etc.). The package is quite limited by comparison, offering some canned functionality for media playback and a few other curious features (face recogniation and dial tone generation, for example), but no way to go deeper. When so many media apps for Android are actually server-dependent, like speech-to-text apps that upload audio files for conversion, it says to me there’s not much of a there there, at least for the things I find interesting.

Even when I switched from journalism and failed screenwriting to programming and book-writing in the late 90’s, at the peak of the Microsoft era, I never considered for a second the option of learning Windows programming and adopting that platform. I just didn’t like their stuff, and still don’t. The point being that I, and you, don’t have to chase the market leader all the time. Go with what you like, where you’ll be the most productive and do the most interesting work.

There’s a bit in William Goldman’s Adventures in the Screen Trade (just looked in my copy, but couldn’t find the exact quote), where the famous screenwriter excuses himself from a story meeting, quitting the project by saying “Look, I am too old, and too rich, to have to put up with this shit.” I like the spirit of that. Personally, I may not be rich, but I’m certainly past the point where I’m willing to put up with someone else’s trite wisdom, or the voice of the developer mob, telling me where I should focus my skills and talents.

iPhone book touch-up

I just got work from our editor that iPhone SDK Development is “flying off shelves” and they need to rush to reprint. That’s a problem we’d like to have, of course!

Anyways, I’m taking a day off Next Exit to tend to small errata that can be fixed without major disruption… nothing that would take copy-editing or serious re-layout. There aren’t that many errata, and a few of them are my own (40984, for example), so it’s nice to have a chance to do a quick fix-up. There’s a little dust here and there where a Leopard screenshot already looks dated, but it’s nothing that should alarm anyone too badly. When Apple announces the inevitable iPhone SDK 4.0, then we’ll start sweating.

I haven’t had a lot of people asking about a Kindle version, so maybe that means the message has gotten out: if you buy the eBook and paper bundle directly from the Prags’ website, you get access to a Kindle-compatible mobi version, along with PDF, and a epub for use with the lovely Stanza e-book reader for iPhone. Daniel also informed me that if you’ve bought the hard-copy elsewhere, you can still upgrade to the e-bundle if you want an electronic copy, by registering it on your Prags bookshelf.

I do feel like my coding style has changed a little between writing a bigger app (Next Exit is about 7,000 LOC), and wonder how that would translate to a new edition or another book. My guess is that you’d see a lot more #defines for starters. I’ve also adopted the practice of moving the dealloc method to the top of the file — right after whichever form of init... is used — to make memory management more prominent and remind me to release those instance variables.

Revealed: “Next Exit”

Two months ago, in Bringing Your Own Maps, I went off on an atypical excursion into the realms of location-based applications and how an iPhone developer would need to license data from providers to develop apps that provide turn-by-turn directions or other routing and location-based search functionality.

That was your hint that I was up to something.

Today, spurred by the AppsFire App Star awards and its requirement of a public YouTube demo, I’m revealing the project I’ve been working on for the past two months: Next Exit.

As I’ve been blurbing it:

Next Exit is the safe, sane way to find gas, food, and lodging along US highways, with a no-fuss, one-thumb interface

Allow me to explain further:

Why Do I Need a Special-Purpose Map App?

To pin down why I wrote this app, I’ll go back to a Summer trip to California with the family. We were driving back from Disneyland to San Diego and needed to get something for the insanely picky kids to eat. As it turns out, California doesn’t have those blue “services at the next exit” signs that are common elsewhere in the country (well, in every state between Michigan and Florida, at least). I-5 also had no billboards in this stretch. So, short of actually managing to see the elusive Taco Bell itself before passing the exit, there was no practical way to figure out where to get off.

So, yeah, I did the obvious thing and searched the Maps application for “Taco Bell”. While driving. Not smart.

Screenshot 2009.11.29 14.25.05

This sucks for a couple of reasons… the most obvious being the driver distraction and the vastly increased likelihood you’ll crash into someone or something while fussing with the phone. But even if you do manage to send off a search, the results are sub-optimal: it will search where you are, not where you’re going, meaning it’s just as likely that you’ll get results five miles off your current route, or even behind you, as it is likely to find results that you can actually use.

Most people on long freeway drives want services that are right there on the highway. This means a search needs to be a lot smarter:

  • Figure out what road the user is on and what way he or she is going
  • Figure out where that road goes
  • Find exits along that road
  • Find services of certain types within a certain distance of those exits

Next Exit is the app that provides that kind of search.

Let’s Watch the Video

At this point, let me point you to the video demo that I prepared for the App Star contest. They wanted something around 30 seconds, which I submitted as the short version, but this longer version is still under a minute and shows off more stuff:


Gee, That Doesn’t Look So Hard

This is an app that looks simple but is actually quite complex underneath. If you read the original “Bringing Your Own Maps” post, you’ll recall that the iPhone OS’ “Map Kit” provides visuals for maps, but doesn’t actually have any location data behind it. A diagonal line marked “Market St.” is just that – a bunch of pixels, and nothing more. To have any concept of streets, you need to go to third-party map services. On the iPhone, this is compounded by the fact that your code needs to be in C/Obj-C, while most of the mapping APIs are written for the popular server-side scripting languages, or JavaScript (which speaks to the larger point that the mapping companies see their developer audience as web developers, not embedded or desktop developers). The only thing that’s really practical on the iPhone is a webservice or other network-oriented API that can be called from Cocoa Touch’s networking classes. The downside: lots of XML parsing on the receiving end.

Then there’s an even deeper question of how you even solve this problem. The mapping APIs are largely written from the point of view of “given a starting point and a destination, find a route.” But the question posed by this app is “given a starting point and a direction, find potential destinations.” My initial version searched ahead for exits, drawing lines between the furthest ones to account for turns in the road, but could get thrown off when the current highway meets another, as the other highway’s exits onto the current highway ended up in the search results and weren’t practical to remove. In the end, I developed a complex but more reliable system of finding road segments for the current freeway, arranging them to create a path, and then searching this path for exits. I’ll be writing more about this geo-logic in future updates.

Oh, and remember: I don’t have the luxury of doing these searches on a local database. This is all back-and-forth with MapQuest’s web server, swapping and parsing XML.

MapQuest? Really?

The location data is provided by MapQuest, under a commercial license. Why them? It helps that they replied to my initial request for licensing terms (unlike some of the other companies in this space). But perhaps more importantly, MapQuest’s entire API is accessible via their web-based XML protocol, which is slowly being supplanted by a set of equally powerful web services. By comparison, a lot of the good stuff in Google Maps is only practical with JavaScript — they have an HTTP API, but it has big holes — which perhaps suits their web-centric view of the world, but doesn’t help the embedded developer.

Also, MapQuest seems eager to work with developers and to understand the iPhone App Store market. They brought an engineer to a sales call with me, and he helped me figure out the find-and-arrange-segments logic that cured the early prototypes’ tendency to turn off onto unrelated interstates. So far, I’m really liking working with them.

So When Does It Go On Sale?

I have one more major task to account for: in-app purchase. Since the use of the MapQuest service will create an ongoing cost for however long copies of the app remain in use, a free or one-time payment model is not going to work. A subscription model is more appropriate: pay as you go, stop paying if you stop using it.

The elaborate, turn-by-turn, singing-and-dancing apps like Tom Tom and Navigon are going for $100. Next Exit does a lot less – by design – and therefore should cost less. So I’m keeping it impulse-worthy:

  • $1.99 for the app and three months of service
  • $4.99 for each 12 months of service thereafter

Folks, that’s less than the cost of an upsized value meal… and with Next Exit, you’ll be able to find the road-side restaurants with the value meals you like, not the ones you’ll just settle for!

Anyways, with hopes of finishing in-app purchase (and an audit-trail server on my end… groan), finalizing things with MapQuest, and getting things through the App Store review process in the next few weeks, I’ve got a fighting chance of getting this out before people hit the roads for the holiday travel season. Or, if it comes out after Christmas, there’ll just be that many more new iPhones in play, looking for useful apps.

More, much more, to follow. For now, fingers are crossed that the Apps Star jury will like what they see. The prize in their contest is free publicity… exactly what a new app needs!

Royalty pains

Author Peter Cooper posted a blog about his experience publishing Beginning Ruby for APress, a blog that got extraordinary traffic after being featured on Slashdot with a misleading summary. Much of the piece concerns itself with royalties, how they’re calculated, and how they’re paid. In his conclusion, he advocates writing for the Pragmatic Programmers which offers a 50%-of-profits royalty rate.

Tim O’Reilly himself saw fit to counter that advice, claiming that the Prags’ royalty isn’t what it seems, and to imply that a bigger publisher would move more books, meaning that a smaller royalty on bigger sales would cancel out the Prags’ advantage.

About a week later, the Prags’ Dave Thomas posted a blog of his own, spelling out the specifics of the Prags’ royalties.

I commented on Dave’s blog:

Having written one book with the Prags and two elsewhere, my mental taxonomy is now “writing for Prags” versus “writing for free”. Yes, between 10% royalty * coauthors * Amazon discounts * shrinking computer book market, it really is that bad.

But I have more to say, which is why I’m blogging now.

In my experience, and my understanding of the current nature of the computer book market, O’Reilly’s claims that its size give it an advantage in moving more books is probably true to a limited degree, but not enough to make up what is effectively a four-to-five-fold difference in royalty rates. With computer book sections shrinking year after year in brick-and-mortar bookstores, a huge majority of computer books are purchased online from Amazon and its lesser rivals (or just stolen, but that’s another story), which obviates the advantages a bigger publisher like O’Reilly would have in getting its product into more stores.

I wrote one book for O’Reilly and co-wrote another, both released in 2005. QuickTime for Java: A Developer’s Notebook is clearly a niche title (and the topic API is now deprecated), and if Swing Hacks is somewhat more noticable, it’s only because the Java development community is so large. I suspect only one in every 25 to 50 Java developers does Desktop Java, but over 5 million developers, that’s still enough to be interesting. Over their respective lifetimes, the QTJ book sold about 2,000 copies (bad), and Swing Hacks sold 10,000 (good).

The iPhone SDK Development book that I co-wrote for the Prags nearly outsold both of them, combined, just in pre-release beta sales. Granted, the topic is clearly more in demand, but still, enough people found their way to the Prags’ site to buy the beta that before a single final copy had the shelves. I’ve already outearned four years of royalties on the two O’Reilly books several times over. About the same number of copies, but much higher royalties. It’s that simple.

Aside: one factor that complicates comparing apples to oranges: since QTJ:ADN never outearned its advance, the royalties on I earned SH beyond its advance were applied to my royalty debt on QTJ until that was paid-up. I’m not sure if this is standard practice.

One other factor that I think is even more interesting is that the Prags presumably can offer a higher royalty because their overhead is very low. O’Reilly has a lovely campus in business-unfriendly California; I’m not sure the highly-distributed Prags even have a formal office. The low overhead is presumably what allows the Prags to offer such a high rate, but moreover, they’re able to take a chance on niche-ier topics. O’Reilly’s upcoming Cocoa and Objective-C: Up and Running is, by my count, their fourth introductory Cocoa programming book (following the awful ADC-authored Learning Cocoa, James Duncan Davidson’s rewrite of it, and Mike Beam’s Cocoa in a Nutshell). Apparently the broad Mac programming market is big enough to be interesting to them, but not any smaller part of it. If you want a book on Core Animation or Xcode, you pretty much have to look to other publishers (notable exception: the Bonjour/Zeroconf book).

To me, that’s a bigger deal, because your royalties are zero if a publisher won’t even put your title out there. Right now, the iPhone market has ample introductory titles (and Mac is almost there, once titles are updated for Snow Leopard or Daniel Steinberg finishes his overhauled-for-SL Cocoa book), and the next step is to get deeper into topics that are too large or too difficult to cover in introductory books. But almost by definition, these titles carve up the market for the introductory book, and only publishers who can make money off niches can produce such titles. Pretty safe to say that if we see a book on, say, Core Audio, or advanced use of the Xcode toolset (IB, Instruments, etc.), it’s going to be from one of these smaller publishers.

@property blah blah

Like everyone else, I’ve tired of Obj-C’s dance of redundancy, specifically having to declare interface variables for properties if you want to run in the simulator (or the “old” Obj-C runtime on Mac). To speed things up, I create the ivars first, then copy-and-paste outside the @interface block to set up the properties. Which still sucks, because I have to prepend every line with the @property declaration.

9 times out of 10, the property is for a UI outlet, so I’m always setting it up as @property (nonatomic, retain) IBOutlet IvarType *ivarName

To speed things up, I finally figured out how to set up an Xcode text macro for this:

    Identifier =;
    BasedOn = objc;
    IsMenuItem = YES;
    Name = "IBOutlet @property";
    TextString = "@property (nonatomic, retain) IBOutlet ";
    CompletionPrefix = "@property";
    OnlyAtBOL = YES;
    IncludeContexts = ( "xcode.lang.objc.block" );

Then I used the Keyboard system pref to add an Xcode-only keyboard shortcut to the “IBOutlet @property” menu item (right now it’s cmd-option-I… we’ll see if that sticks).

And all this makes setting up my properties suck just a little bit less.

Lazy-ass XML parsing

Late in the development of iPhone SDK Programming, I added a section to the networking chapter on web services, probably the top reader request. The focus of the section was on using NSXMLParser to parse a response from a web service received over the network, in this case the Twitter public timeline.

NSXMLParser is an event-driven parser: it calls back to a delegate as it encounters the beginning or end of each element, text, comment, etc. In the final book, we use a very simplistic delegate to pick off just the elements we care about, ignoring the rest. We went with this approach because an earlier beta of the book adopted the “parse the whole tree” approach suggested by Apple’s Introduction to Event-Driven XML Programming Guide for Cocoa, and the feedback from both editor and readers was that it was too hard and too much work for the sample problem.

And it was, despite one truly nifty technique that Apple provides you: define a custom element class, and as you parse, you pass around the parser’s delegate to each element as it’s being filled in. For example, when you encounter a child element, you init a MyElement object, and then make that new element the new delegate. Similarly, when elements end, you return the delegate to the parent element.

So this is nice, but it’s still kind of heavy. At the moment, I’m parsing XML from a MapQuest result (via their XML protocols), and wanted to try something a little lighter. Moreover, I wanted to be able to get at the parsed data with KVC, so I could just provide a key-path of the form root.child.grandchild. As an experiment, I tried parsing everything into a deeply-nested NSDictionary, which easily supports KVC.

After an hour or two, the idea basically works, though I’ll be the first to tell you this is sloppy code (I’m sure I’m leaking some element-name strings, but neither I nor the Clang Static Analyzer has found them), it loses the order of siblings (which I don’t care about), and it doesn’t yet handle multiple child elements with the same name (which would get into the indexed accessor pattern). Also, the character data is kludged into a pseudo-child called value, whereas using a custom element class would allow you to more carefully distinguish an element’s text, child elements, and attributes.

Basic idea is to keep a master dictionary for the parsed doc, parsedResponseDictionary, the current path being parsed, parseElementPath, and a mutable string for the current element’s character data, currentCharacters, which can arrive over the course of multiple callbacks.

Here are the essential delegate methods:

- (void)parserDidStartDocument:(NSXMLParser *)parser {
	NSLog (@"didStartDocument");
	[parsedResponseDictionary release];
	parsedResponseDictionary = [[NSMutableDictionary alloc] init];
	parseElementPath = @"";

- (void)parser:(NSXMLParser *)parser 	didStartElement:(NSString *)elementName 
		namespaceURI:(NSString *)namespaceURI 
		qualifiedName:(NSString *)qName
		attributes:(NSDictionary *)attributeDict {
	NSLog (@"didStartElement:%@", elementName);
	NSMutableDictionary *newElement = [[NSMutableDictionary alloc] init];
	NSMutableDictionary *parent;
	if ([parseElementPath length] == 0) {
		NSLog (@"parent is root");
		parent = parsedResponseDictionary;
	} else {
		NSLog (@"need parent %@", parseElementPath);
		parent = [parsedResponseDictionary valueForKeyPath:parseElementPath];
		// note valueForKeyPath: sted valueForKey:
	[parent setValue:newElement forKey:elementName];
	[newElement release];
	NSString *newParseElementPath = nil;
	if ([parseElementPath length] > 0) {
		newParseElementPath = [[NSString alloc] initWithFormat: @"%@.%@",
			  parseElementPath, elementName];
	} else {
		newParseElementPath = [elementName copy];
	parseElementPath = newParseElementPath;
	NSLog (@"new path is %@", parseElementPath);

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName 
		namespaceURI:(NSString *)namespaceURI 
		qualifiedName:(NSString *)qName {
	NSLog (@"didEndElement:%@", elementName);
	if (currentCharacters) {
		NSMutableDictionary *elementDict =
			[parsedResponseDictionary valueForKeyPath:parseElementPath];
		[elementDict setValue: currentCharacters forKey: @"value"];
		currentCharacters = nil;
	NSRange parentPathRange;
	parentPathRange.location = 0;
	NSRange dotRange = [parseElementPath
		rangeOfString:@"." options:NSBackwardsSearch];
	NSString *parentParseElementPath = nil;
	if (dotRange.location != NSNotFound) {
		parentPathRange.length = dotRange.location;
		parentParseElementPath =
			[parseElementPath substringWithRange:parentPathRange];
	} else {
		parentParseElementPath = @"";
	parseElementPath = parentParseElementPath;
	NSLog (@"new path is %@", parseElementPath);

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
	NSLog (@"foundCharacters");
	if (!currentCharacters) {
		currentCharacters = [[NSMutableString alloc] 
			initWithCapacity:[string length]];
	[currentCharacters appendString:string];

Using the sample request from MapQuest’s API docs, the parsed NSDictionary looks like this:

2009-09-29 12:34:40.263 MapQuestThrowaway1[6077:207] parsed dict:
    GeocodeResponse =     {
        LocationCollection =         {
            GeoAddress =             {
                AdminArea1 =                 {
                    value = US;
                AdminArea3 =                 {
                    value = PA;
                AdminArea4 =                 {
                    value = Lancaster;
                AdminArea5 =                 {
                    value = Mountville;
                LatLng =                 {
                    Lat =                     {
                        value = "40.044618";
                    Lng =                     {
                        value = "-76.412124";
                PostalCode =                 {
                    value = 17554;
                ResultCode =                 {
                    value = B1AAA;
                SourceId =                 {
                    value = ustg;
                Street =                 {
                    value = "[3701-3703] Hempland Road";

More importantly for current experimentation purposes, this lets me grab values from the parsed dictionary with KVC-style access:

NSLog (@"key-val test: lat long is %@, %@",
   [parsedResponseDictionary valueForKeyPath:
   [parsedResponseDictionary valueForKeyPath:

That code produces the desired result:

key-val test: lat long is 40.044618, -76.412124

It’s not pretty, but it’s also not a lot of code, and allows me to get on with getting and processing the result data rather than dancing around with fancy XML parsing for a day or two.

Threads on the Head

Lack of posts lately… heads down on an iPod game. It’s built up of mini-games, about half of which are done. Today, I’m facing the problem of having to create a mini-game that uses some of the metadata in the iPod library that can’t be directly queried. So, I have to go over every song in the library and perform my own analysis.

Obviously, this would be death to at startup or in the middle of the game. Walking my 700-song library takes 6-7 seconds, and users could have far more songs.

Cut to the win: NSOperation makes it easy to do stuff on threads, without having to, you know, write your own pthread stuff.

As a test, I wrote a subclass of NSOperation to perform a simple analysis on the library: count the number of songs that have “the” in the title. Here’s the -main method:

-(void) main {
   NSDate *beginDate = [NSDate date];
   NSLog (@"*** DYDeepLibraryAwarenessOperation is cogitating and ruminating");
   // test - count titles that have the word "the" in them.
   int theCount = 0;
   MPMediaQuery *allsongs = [MPMediaQuery songsQuery];
   NSLog (@"Thinking about %d songs", [allsongs.items count]);
   for (MPMediaItem *item in allsongs.items) {
      NSRange theRange = [[item valueForProperty:MPMediaItemPropertyTitle]
         rangeOfString: @"the" options: NSCaseInsensitiveSearch];
      if (theRange.location != NSNotFound) {
   NSLog (@"*** %d songs in the iPod Library contain the word "the".", theCount);
   NSLog (@"*** DYDeepLibraryAwarenessOperation has achieved enlightenment (in %f sec).",
         fabs ([beginDate timeIntervalSinceNow]));

Then, as the app starts up, the operation is run as part of an NSOperationQueue

awarenessOperation = [[DYDeepLibraryAwarenessOperation alloc] init];
operationQueue = [[NSOperationQueue alloc] init];
[operationQueue addOperation:awarenessOperation];
NSLog (@"DYDeepLibraryAwareness set up NSOperationQueue");

Here’s the output when the code is just left to run by itself (I’ve taken out the date, classname, and line number from the output for space):

15:30:47.979 DYDeepLibraryAwareness set up NSOperationQueue
15:30:47.976 *** DYDeepLibraryAwarenessOperation is cogitating and ruminating
15:30:48.238 Thinking about 740 songs
15:30:54.586 *** 168 songs in the iPod Library contain the word "the".
15:30:54.589 *** DYDeepLibraryAwarenessOperation has achieved enlightenment (in 6.613482 sec).

Perhaps more importantly, and what I can’t show in a blog, is that this other thread does not interfere with the GUI, or with queries to the iPod library from the main thread, which are done to set up and play the first mini-game. So this means that the iPod library server can handle multiple concurrent requests (yay), and that I can do the heavy lifting to set up later games while presenting and playing the simpler ones.

What’s New, Blue Q?

One-time self-described “World’s Greatest Compressionist” Ben Waggoner posts a pointed question to the quicktime-api list:

What I’d like to know is if QuickTime X is going to be available for Windows and older versions of Mac OS X.

It’s an important issue, because despite iTunes’ insistence on installing QuickTime on Windows, the future of that product seems completely unknown. For years, every question I’ve seen about the future of QuickTime on Windows has been met with absolute silence from Apple. Yeah, I know, “Apple does not comment on unannounced products,” and all… Still, Apple has left this technology in limbo for a remarkably long time. I recall asking ADC reps about QuickTime for Windows back at Leopard Tech Day Atlanta in 2006, as I was considering calling it from Java with JNI, and (as previously noted), I got no reply at all. And every other public question I’ve seen about the future of QuickTime on Windows has gone similarly unanswered, for years.

Smell that? That’s the scent of Abandoned Code Rot. We got that from QuickTime for Java for a few years before they managed to finally deprecate it (though they apparently haven’t gotten the message out).

It wouldn’t be too surprising to see QT for Windows fall by the wayside… Apple probably cares more about the popularity of its favorite formats and codecs (AAC and H.264) than of the QuickTime APIs and QuickTime’s interactive features like Wired Sprites that have been clearly and unequivocally beaten by Flash.

But if that’s true of Windows, is it also true on the Mac? QuickTime developers are right to be a little worried. The old C-based QuickTime API remains a 32-bit only option, intended to be replaced by the Objective-C QTKit. But in the four years since its introduction in Tiger, QTKit has only taken on part of the capabilities of the old QuickTime API. With Leopard, you could finally do capture and some significant editing (e.g., inserting segments at the movie or track levels), but raw sample level data was unavailable for any track type other than video, and some of the more interesting track types (like effects and especially tweens, useful for fading an audio track’s volume between specific times) are effectively useless in QTKit.

With Snow Leopard, the big news isn’t a more capable QTKit API, it’s QuickTime X. And as Apple’s QuickTime X page points out, QTX is all about a highly-optimized playback path (using decode hardware if available) and polished presentation. Great news if you’re playing 1080p movies on your computer or living room PC, not so much if you want to edit them: if you want to edit anything, you’re back in the old 32-bit QuickTime (and the code is probably still written in C against the old APIs, given QTKit’s numerous limitations). You don’t see a 64-bit Final Cut Pro, now do you? (BTW, here’s a nice blog on that topic.)

When you all install Snow Leopard tomorrow and run the QTX-based QuickTime Player, you’ll immediately understand why the $30 QuickTime Pro (which bought you editing and exporting from the Player app and the plug-in) is gone. Follow up in the comments tomorrow (after the NDA drops) and we’ll discuss further.

If I were starting a major new multimedia project that wasn’t solely playback-based — imagine, say, a podcast studio that would combine the editing, exporting, and publishing tasks that you might currently perform with Garage Band, iTunes, and FTP — I would be very confused as to which technology to adopt. QuickTime’s cross-platform story seems to be finished (QTJ deprecated, QTW rotting away), and everything we hear on the Mac side is about playback. Would it be safer to assume that QuickTime doesn’t have a future as a media creation framework, and drop down to the engine level (Core Audio and Core Video)? And if not QuickTime… then what?

Oh, and as for the first question from the quicktime-api thread:

… How about Apple throwing us a bone as to what QuickTime X will offer those of us that use QT and QTSS?

From what I can tell, Apple has all but ditched QTSS in favor of HTTP Live Streaming, supported by QuickTime X and iPhone 3.0.

Fun with varargs

For reasons you don’t need to know about (yet), I wanted to get my usual crutch of a logging UITextview implemented in plain C.

I hadn’t wanted to mess with varargs, so I usually write an Obj-C method like this:

-(void) screenLog:(NSString*) s {
	textView.text = [NSString stringWithFormat:@"%@%@n",
		textView.text, s];

What this does is to create an autoreleased NSString built from a format that’s just two strings concatenated together — the current contents of the text view and the argument string — and a new-line character. It then sets this new string as the new text of the UITextView

It sucks a little bit to call, because you have to pass in an NSString, not the usual varargs you’d use with NSLog. So to do:

NSLog (@"Current age: %d", 41);

you’d have to build the string up-front, like this:

[self screenLog: [NSString stringWithFormat: @"Current age: %d", 41]];

So, kind of annoying, but still useful when you want to log to the screen instead of standard out, like I’ve had to do this week while doing some Bonjour stuff between multiple devices scattered about the office, at most one of which gets to log to Xcode’s console. Yesterday’s post, with onscreen output of the two devices getting each other’s test message, shows why this is a nice crutch to have for experiments, prototypes, and throwaways.

Anyways, I actually wanted to do this with plain ol’ C, and happened across Matt Gallagher’s great write-up of varargs in Cocoa. Combining that with the realization that NSString has some method signatures that take a va_list, I was able to rewrite my screen logger in plain ol’ C:

void LogToUITextView (UITextView *view, NSString* format, ...) {
	va_list args;
	va_start (args, format);
	NSString* appendedText = [[NSString alloc]
				initWithFormat: format arguments: args];
	view.text = [NSString stringWithFormat:
				 @"%@%@n", view.text, appendedText];
	[appendedText release];

Calling it feels a lot more like calling NSLog:

- (void)viewDidLoad {
    [super viewDidLoad];

	// customize point
	LogToUITextView(textView, @"Current age: %d", 41);
	LogToUITextView(textView, @"Current weight: %3.1f", 243.6);
	LogToUITextView(textView, @"Available fonts:n %@",
				[UIFont familyNames]);

And check it out: it actually works:


I’ll probably adapt the varargs approach in my Obj-C logging function going forwards, but still, it’s nice to be able to make the procedural C call, especially since you could switch all NSLog calls to LogToUITextView with a single global replace.

Update: Here’s an even “more C” version that’s functionally equivalent:

void LogToUITextView (UITextView *view, NSString* format, ...) {
	va_list args;
	va_start (args, format);
	CFStringRef appendedText = CFStringCreateWithFormatAndArguments (
		(CFStringRef) format,
	CFStringRef newText = CFStringCreateWithFormat (
		(CFStringRef) @"%@%@n",
	view.text = (NSString*) newText;
	CFRelease (newText);
	CFRelease (appendedText);

Obviously wordier, and we lose a convenient autorelease, since CoreFoundation doesn’t have autoreleasing.

An iPhone Core Audio brain dump

Twitter user blackbirdmobile just wondered aloud when the Core Audio stuff I’ve been writing about is going to come out. I have no idea, as the client has been commissioning a lot of work from a lot of iPhone/Mac writers I know, but has a lengthy review/rewrite process.

Right now, I’ve moved on to writing some beginner stuff for my next book, and will be switching from that to iPhone 3.0 material for the first book later today. And my next article is going to be on OpenAL. My next chance for some CA comes whenever I get time to work on some App Store stuff I’ve got planned.

So, while the material is still a little fresh, I’m going to post a stream-of-consciousness brain-dump of stuff that I learned along the way or found important to know in the course of working on this stuff.

  • It’s hard. Jens Alfke put it thusly:

    “Easy” and “CoreAudio” can’t be used in the same sentence. 😛 CoreAudio is very powerful, very complex, and under-documented. Be prepared for a steep learning curve, APIs with millions of tiny little pieces, and puzzling things out from sample code rather than reading high-level documentation.

  • That said, tweets like this one piss me off. Media is intrinsically hard, and the typical way to make it easy is to throw out functionality, until you’re left with a play method and not much else.

  • And if that’s all you want, please go use the HTML5 <video> and <audio> tags (hey, I do).

  • Media is hard because you’re dealing with issues of hardware I/O, real-time, threading, performance, and a pretty dense body of theory, all at the same time. Webapps are trite by comparison.

  • On the iPhone, Core Audio has three levels of opt-in for playback and recording, given your needs, listed here in increasing order of complexity/difficulty:

    1. AVAudioPlayer – File-based playback of DRM-free audio in Apple-supported codecs. Cocoa classes, called with Obj-C. iPhone 3.0 adds AVAudioRecorder (wasn’t sure if this was NDA, but it’s on the WWDC marketing page).
    2. Audio Queues – C-based API for buffered recording and playback of audio. Since you supply the samples, would work for a net radio player, and for your own formats and/or DRM/encryption schemes (decrypt in memory before handing off to the queue). Inherent latency due to the use of buffers.
    3. Audio Units – Low-level C-based API. Very low latency, as little as 29 milliseconds. Mixing, effects, near-direct access to input and output hardware.
  • Other important Core API’s not directly tied to playback and recording: Audio Session Services (for communicating your app’s audio needs to the system and defining interaction with things like background iPod player, ring/silent switch) as well as getting audio H/W metadata, Audio File Services for reading/writing files, Audio File Stream Services for dealing with audio data in a network stream, Audio Conversion Services for converting between PCM and compressed formats (and vice versa), Extended Audio File Services for combining file and conversion Services (e.g., given PCM, write out to a compressed AAC file).

  • You don’t get AVAudioPlayer or AVAudioRecorder on the Mac because you don’t need them: you already have QuickTime, and the QTKit API.
  • The Audio Queue Services Programming Guide is sufficient to get you started with Audio Queues, though it is unfortunate that its code excerpts are not pulled together into a complete, runnable Xcode project.

  • Lucky for you, I wrote one for the Streaming Audio chapter of the Prags’ iPhone book. Feel free to download the book’s example code. But do so quickly — the Streaming Audio chapter will probably go away in the 3.0 rewrite, as AVAudioRecorder obviates the need for most people to go down to the Audio Queue level. We may find some way to repurpose this content, but I’m not sure what form that will take. Also, I think there’s still a bug in the download where it can record with impunity, but can only play back once.

  • The Audio Unit Programming Guide is required reading for using Audio Units, though you have to filter out the stuff related to writing your own AUs with the C++ API and testing their Mac GUIs.

  • Get comfortable with pointers, the address-of operator (&), and maybe even malloc.

  • You are going to fill out a lot of AudioStreamBasicDescription structures. It drives some people a little batty.

  • Always clear out your ASBDs, like this:

    memset (&myASBD, 0, sizeof (myASBD))

    This zeros out any fields that you haven’t set, which is important if you send an incomplete ASBD to a queue, audio file, or other object to have it filled in.

  • Use the “canonical” format — 16-bit integer PCM — between your audio units. It works, and is far easier than trying to dick around bit-shifting 8.24 fixed point (the other canonical format).

  • Audio Units achieve most of their functionality through setting properties. To set up a software renderer to provide a unit with samples, you don’t call some sort of a setRenderer() method, you set the kAudioUnitProperty_SetRenderCallback property on the unit, providing a AURenderCallbackStruct struct as the property value.

  • Setting a property on an audio unit requires declaring the “scope” that the property applies to. Input scope is audio coming into the AU, output is going out of the unit, and global is for properties that affect the whole unit. So, if you set the stream format property on an AU’s input scope, you’re describing what you will supply to the AU.

  • Audio Units also have “elements”, which may be more usefully thought of as “buses” (at least if you’ve ever used pro audio equipment, or mixing software that borrows its terminology). Think of a mixer unit: it has multiple (perhaps infinitely many) input buses, and one output bus. A splitter unit does the opposite: it takes one input bus and splits it into multiple output buses.

  • Don’t confuse buses with channels (ie, mono, stereo, etc.). Your ASBD describes how many channels you’re working with, and you set the input or output ASBD for a given scope-and-bus pair with the stream description property.

  • Make the RemoteIO unit your friend. This is the AU that talks to both input and output hardware. Its use of buses is atypical and potentially confusing. Enjoy the ASCII art:

                             | i                   o |
    -- BUS 1 -- from mic --> | n    REMOTE I/O     u | -- BUS 1 -- to app -->
                             | p      AUDIO        t |
    -- BUS 0 -- from app --> | u       UNIT        p | -- BUS 0 -- to speaker -->
                             | t                   u |
                             |                     t |

    Ergo, the stream properties for this unit are

    Bus 0 Bus 1
    Input Scope: Set ASBD to indicate what you’re providing for play-out Get ASBD to inspect audio format being received from H/W
    Output Scope: Get ASBD to inspect audio format being sent to H/W Set ASBD to indicate what format you want your units to receive
  • That said, setting up the callbacks for providing samples to or getting them from a unit take global scope, as their purpose is implicit from the property names: kAudioOutputUnitProperty_SetInputCallback and kAudioUnitProperty_SetRenderCallback.

  • Michael Tyson wrote a vital blog on recording with RemoteIO that is required reading if you want to set callbacks directly on RemoteIO.

  • Apple’s aurioTouch example also shows off audio input, but is much harder to read because of its ambition (it shows an oscilliscope-type view of the sampled audio, and optionally performs FFT to find common frequencies), and because it is written with Objective-C++, mixing C, C++, and Objective-C idioms.

  • Don’t screw around in a render callback. I had correct code that didn’t work because it also had NSLogs, which were sufficiently expensive that I missed the real-time thread’s deadlines. When I commented out the NSLog, the audio started playing. If you don’t know what’s going on, set a breakpoint and use the debugger.

  • Apple has a convention of providing a “user data” or “client” object to callbacks. You set this object when you setup the callback, and its parameter type for the callback function is void*, which you’ll have to cast back to whatever type your user data object is. If you’re using Cocoa, you can just use a Cocoa object: in simple code, I’ll have a view controller set the user data object as self, then cast back to MyViewController* on the first line of the callback. That’s OK for audio queues, but the overhead of Obj-C message dispatch is fairly high, so with Audio Units, I’ve started using plain C structs.

  • Always set up your audio session stuff. For recording, you must use kAudioSessionCategory_PlayAndRecord and call AudioSessionSetActive(true) to get the mic turned on for you. You should probably also look at the properties to see if audio input is even available: it’s always available on the iPhone, never on the first-gen touch, and may or may not be on the second-gen touch.

  • If you are doing anything more sophisticated than connecting a single callback to RemoteIO, you may want to use an AUGraph to manage your unit connections, rather than setting up everything with properties.

  • When creating AUs directly, you set up a AudioComponentDescription and use the audio component manager to get the AUs. With an AUGraph, you hand the description to AUGraphAddNode to get back the pointer to an AUNode. You can get the Audio Unit wrapped by this node with AUGraphNodeInfo if you need to set some properties on it.

  • Get used to providing pointers as parameters and having them filled in by function calls:

    AudioUnit remoteIOUnit;
    setupErr = AUGraphNodeInfo(auGraph, remoteIONode, NULL, &remoteIOUnit);

    Notice how the return value is an error code, not the unit you’re looking for, which instead comes back in the fourth parameter. We send the address of the remoteIOUnit local variable, and the function populates it.

  • Also notice the convention for parameter names in Apple’s functions. inSomething is input to the function, outSomething is output, and ioSomething does both. The latter two take pointers, naturally.

  • In an AUGraph, you connect nodes with a simple one-line call:

    setupErr = AUGraphConnectNodeInput(auGraph, mixerNode, 0, remoteIONode, 0);

    This connects the output of the mixer node’s only bus (0) to the input of RemoteIO’s bus 0, which goes through RemoteIO and out to hardware.

  • AUGraphs make it really easy to work with the mic input: create a RemoteIO node and connect its bus 1 to some other node.

  • RemoteIO does not have a gain or volume property. The mixer unit has volume properties on all input buses and its output bus (0). Therefore, setting the mixer’s output volume property could be a de facto volume control, if it’s the last thing before RemoteIO. And it’s somewhat more appealing than manually multiplying all your samples by a volume factor.

  • The mixer unit adds amplitudes. So if you have two sources that can hit maximum amplitude, and you mix them, you’re definitely going to clip.

  • If you want to do both input and output, note that you can’t have two RemoteIO nodes in a graph. Once you’ve created one, just make multiple connections with it. The same node will be at the front and end of the graph in your mental model or on your diagram, but it’s OK, because the captured audio comes in on bus 1, and some point, you’ll connect that to a different bus (maybe as you pass through a mixer unit), eventually getting the audio to RemoteIO’s bus 0 input, which will go out to headphones or speakers on bus 0.

I didn’t come up with much (any?) of this myself. It’s all about good references. Here’s what you should add to your bookmarks (or Together, where I throw any Core Audio pages I find useful):