Rss

Archives for : iphone

Book updates: iOS 6 and Friends edition

A couple of important updates on my last two books:


iOS SDK Development

We just pushed an update to iOS SDK Development that updates the book to iOS 6. The biggest change, of course, was the iPhone 5, which required us to rebuild every damn project in the book with Auto Layout so it works well at both screen sizes, 3.5″ and 4″.

While iOS 6 doesn’t have the kinds of massive changes that led us to rewrite the whole book in the first place — blocks, Xcode 4, ARC — we’ve made sure we have adopted new stuff wherever possible, like the new Obj-C literals for strings and dictionaries, and the Social framework that supplants the Twitter framework and adds support for Facebook and potentially other social networks in the future.

Alas, iOS 6 seems to have a grievous -[UIDocument closeWithCompletionHandler:] bug that makes my unit testing example totally suck: instead of being able to perform the test once the completion handler signals that the document has been saved, we have to just wait 30 seconds to be sure UIDocument has done its thing. This is a regression from iOS 5. Link above goes to the Open Radar copy of the bug I filed with Apple.

Continue Reading >>

Let’s trend #iphonepredictions

Come join the fun on the #iphonepredictions hashtag. Let’s see if we can trend it, or at least have a few laughs.

Having a Moment

Please indulge me a Whiny Little Bitch moment. I will try to keep it short, but it will be petulant, jealous, and immature. You have been warned.

TUAW’s “RoadAhead is a different and clever nav app” talks up the new RoadAhead app, which finds upcoming services at US freeway exits. They’ll benefit from the exposure, and being free will certainly help.

But contrary to some of the reviews, the idea of an exit finder that figures out what road you’re on and what direction you’re going and only shows you upcoming exits isn’t new. That was the whole point of the Road Tip app I did a year and a half ago.

Obviously, I think that’s the way to go, and I’m glad RoadAhead does the same thing. I think it’s lazy when apps just do a radius search for this kind of thing, because that way you’ll find stuff that’s behind you, or is hopelessly far off the freeway. The idea of “finding stuff along a freeway” is a specific concept, and this app is true to it. Plus, figuring out where the road goes, winding back and forth, and dealing with things like name changes and state boundaries is a genuinely interesting problem to solve.

One point of difference is that RoadAhead can’t resist the temptation to pile on lots of icons and pinch-zoomable maps and other bling. As soon as you go down that road, the app becomes a distraction, and like so many others in this field, they explicitly say it’s unsuitable for use while driving:

One last, important thing – USING YOUR PHONE WHILE DRIVING IS A BAD IDEA (and in some states illegal). Hand your phone to your passenger and ask them to navigate. Please use good judgment when using your phone in the car.

I agree that using your phone while driving is a bad idea, but I still designed for it because I think people will do it anyways, regardless of what’s in your app description, EULA, or whatever. Road Tip uses spartan layouts, large text and buttons, and a design philosophy that everything should be accessible via one-thumb navigation. This frustrates my colleagues who say Road Tip would sell if I added various features, like the ability to save your search locations and drill deeper into them later (like remembering what’s at the exit when you pull off, so you can find stuff after you check into your hotel). My point was to keep the app as simple as possible for use while moving; once you’re off the road, you’ll be able to use the deeper functionality of other mapping apps.

So yay, I’m true to what I want my app to be… and I’m the only one who wants it like that.

The other thing is that RoadAhead is free, so I can’t figure how this app pays for itself. I decided early on that RoadTip couldn’t be ad-supported, because showing an ad to a driver would be unconscionably distracting. So I opted for a pay model.

And that brings up my other point about my exit-finding competitors. I’ve never downloaded them — I don’t want to get in a situation where I could willfully or inadvertently plagiarize them (this is why I also don’t read other people’s books on topics I’m covering) — but I’ve looked through their descriptions and legal terms, and I’ve never been able to figure out where apps like RoadAhead and iExit get their location data. That I use MapQuest is obvious; I’m contractually obligated to have my users accept the MapQuest TOS, and I have to put a “Powered by MapQuest” notice on any screens that result from their data.

So where are the other guys getting their data from? As I wrote in Bringing Your Own Maps, my research revealed that the map data providers’ terms of service for their free services always prohibited use in paid iPhone apps, either because they required callers be free and public web apps (Google Maps TOS, section 9.1.1(a)), or prohibited use based on GPS location (i.e., “present or alert an end user to individual maneuvers of a route in any way that is synchronized with the end-user’s sensor-based position along the route”, as in the Bing Maps TOS, section 2.i), or both, or something else.

This stuff is why I had to enter into commercial licensing with MapQuest (whose terms and API I liked best). Not sure what the competition is doing. Maybe they found some way to get free data legally, maybe they’re getting away with not… it’s not really my business I suppose. But if it does turn out I’m the only one playing by the rules, yeah, I’ll be a little pissed.

I keep calling them the competition, and I suppose that’s not accurate. I’m not competing at all… I’ve totally capitulated. I’m unlikely to put any further work into Road Tip, outside of possibly switching to the new I-AP subscription model that restores across devices (good for my long-time users, bad for me because it means sinking more time into this miniscule-selling app). If Ford ever followed through with opening their Sync / MyFordTouch API to third parties, I might make Road Tip work with it, but then again, I might do so just for my own use and not release it. It’s not something that I really feel like putting any further public work into.

OK, whining done. Back to your regularly scheduled blog.

Why Does iOS Video Streaming Suck? Part I

Experiment for you… Google for iphone youtube and look how pessimistic the completions are: the first is “iphone youtube fix” and the third is “iphone youtube not working”

Now do a search for iphone youtube slow and the completions all seem to tell a common story: that it’s slow on wifi. Moreover, there are more than 4 million hits across these search terms, with about 3.6 million just for “iphone youtube slow”.

Related searches show 630,000 hits for iphone netflix slow, 835,000 for iphone ustream slow and 9.6 million for the generic iphone video slow.

Surely something is going on.

I noticed it with my son’s YouTube habit. He often watches the various “let’s play” video game videos that people have posted, such as let’s play SSX. I keep hearing the audio start and stop, and realized that he keeps reaching the end of the buffer and starting over, just to hit the buffer again. Trying out YouTube myself, I find I often hit the same problem.

But when I was at CodeMash last week, even with a heavily loaded network, I was able to play YouTube and other videos on my iPhone and iPad much more consistently than I can at home. So this got me interested in figuring out what the problem is with my network.

Don’t get your hopes up… I haven’t figured it out. But I did manage to eliminate a lot of root causes, and make some interesting discoveries along the way.

The most common advice is to change your DNS server, usually to OpenDNS or Google Public DNS. Slow DNS is often the cause of web slowness, since many pages require lookups of many different sites for their various parts (images, ads, etc.). But this is less likely to be a problem for a pure video streaming app: you’re not hitting a bunch of different sites in the YouTube app, you’re presumably hitting the same YouTube content servers repeatedly. Moreover, I already had OpenDNS configured for my DNS lookups (which itself is a questionable practice, since it allegedly confuses Akamai).

Another suggestion that pops up in the forums is to selectively disable different bands from your wifi router. But there’s no consistency in the online advice as to whether b, g, or n is the most reliable, and dropping b and n from mine didn’t make a difference.

Furthermore, I have some old routers I need to put on craigslist, and I swapped them out to see if that would fix the problem. Replacing my D-Link DIR-644 with a Netgear WGR-614v4 or a Belkin “N Wireless Router” didn’t make a difference either.

In my testing, I focused on two sample videos, a YouTube video The Prom: Doomsday Theme from a 2008 symphonic performance of music from Doctor Who, and the Crunchyroll video The Melancholy of Haruhi Suzumiya Episode 3, played with the Crunchyroll iPhone app, so that I could try to expand the problem beyond the YouTube app and see if it applies to iOS video streaming in general.

And oh boy, does it ever. While the desktop versions of YouTube and Crunchyroll start immediately and play without pauses on my wifi laptop, their iOS equivalents are badly challenged to deliver even adequate performance. On my iPad, the “Doomsday” YouTube video takes at least 40 seconds to get enough video to start playing. Last night, it was nearly five minutes.

If anything, Crunchyroll performs worse on both iPhone and iPad. The “Haruhi” video starts almost immediately, but rarely gets more than a minute in before it exhausts the buffer and stops.

So what’s the problem? They’re all on the same network… but it turns out speeds are different. Using the speedtest.net website and the Speedtest.net Mobile Speed Test app, I found that while my laptop gets 4.5 Mbps downstream at home, the iPad only gets about 2 Mbps, and the iPhone 3GS rarely gets over 1.5 Mbps.

I took the iPhone for a little tour to see if this was consistent on other networks, and got some odd results. Take a look at these:

The top two are on my LAN, and are pretty typical. The next two after that (1/22/11, 2:45 PM) are on the public wifi at the Meijer grocery/discount store in Cascade, MI. The two on the bottom are from a Culver’s restaurant just down 28th St. Two interesting points about these results. Again, neither gives me a download speed over 1.5 Mbps, but look at the uplink speed at Culver’s: 15 Mbps! Can this possibly be right? And if it is… why? Are the people who shoot straight-to-DVD movies in Grand Rapids coming over to Culver’s to upload their dailies while they enjoy a Value Basket?

As for Meijer, the ping and upload are dreadful… but it was the only place where Crunchyroll was actually able to keep up:

See that little white area on the scrubber just to the right of the playhead? It’s something I don’t see much on iOS: buffered data.

So what really is going on here anyways? For one thing, are we looking at progressive download or streaming? I suspect that both YouTube and Crunchyroll use HTTP Live Streaming. It’s easy to use with the iOS APIs, works with the codecs that are in the iDevices’ hardware, and optionally uses encryption (presumably any commercial service is going to need a DRM story in order to license commercial content). HLS can also automatically adjust to higher or lower bandwidth as conditions demand (well, it’s supposed to…). Furthermore, the App Store terms basically require the use of HLS for video streaming.

A look at the network traffic coming off the phone during a Crunchyroll session is instructive:

whois tells us that 65.49.43.x block is assigned to “CrunchyRoll, Inc.”, as expected, and it’s interesting to see that most of the traffic is on port 80 (which we’d expect from HLS), with the exception of one request on 443, which is presumably an https request. The fact that the phone keeps making new requests, rather than keeping one file connection open, is consistent with the workings of HLS, where the client downloads a .m3u8 playlist file, that simply provides a list of 30-second segment files that are then downloaded, queued, and played by the client. Given the consistent behavior between Crunchyroll and YouTube, and Apple’s emphasis on the technology, I’m inclined to hypothesize that we’re seeing HLS used by both apps.

But oh my goodness, why does it suck so much? The experience compares very poorly with YouTube on a laptop, which starts to play almost immediately and doesn’t stop after exhausting the buffer 30 seconds later. Whether you use Flash or the HTML5 support in YouTube (I’ve opted into the latter), it always just works, which is more than can currently be said of the iOS options, at least for me (and, if the Google hit count is right, for a couple million other people).

One other thing that doesn’t wash for me right now: remember when Apple started streaming their special events again? I blogged that it was a good demo of HLS, and I watched the first one with the iPhone, iPad, and Mac Pro running side-by-side to make a point that HLS was good stuff, and it all worked. How can the live stream hold up so well on three devices, yet a single archive stream falls apart on just one of the iOS devices?

I actually really like what I’ve seen of HLS: the spec is clean and the potential is immense. I even wondered aloud about doing a book on it eventually. But I can’t do that if I can’t get it working satisfactorily for myself.

What the hell is going on with this stuff?

If I ever get a good answer, there’ll be a “Part II”.

Do Over

One thing I’m coming away from CodeMash with is a desire to clean up a lot of my old habits and dig into tools and techniques I’ve long known were available, but haven’t used. In some ways, I’m still stuck in my iPhone OS 2 ways in an iOS 4 world.

Daniel Steinberg has taken a pretty extreme position, but one that makes sense: he no longer has any private instance variables in his header files, since the current SDK allows you to put them in the implementation. Combined with the use of a class extension in the .m for helper methods, this makes it possible for the header to be exactly what it’s supposed to be: an exposure of the public interface to your class, with no clues about the implementation underneath.

To my mind, Daniel was also the winner of the “mobile smackdown” session, in which one presenter each from the iOS, Windows Phone 7, and Android camps was given 15 minutes to develop a trivial Twitter app that could manage a persistent list of user names and, when tapped, nagivate to that user’s twitter.com page. I say Daniel won because his iPhone app was the only one to complete all the features in time (actually, Daniel needed an extra 30 seconds to finish two lines of code). The Windows Phone presenter never made it to adding new names to the list, and the Android guy didn’t get around to showing the user’s page. One of Daniel’s wins was in using the “use Core Data for storage” checkbox: by graphically designing a data model for his “Twitterer” class, he picked up persistence and his table view in one fell swoop. Now that I think of it, I don’t remember how, or if, the other platforms persisted their user lists. I don’t use Core Data often, but after this demonstration, I’m much more inclined to do so.

There was a whole session on unit testing for iOS, something I just explored on my own for my first day tutorial (and even then, I was using it as much for illustrating the use of multiple targets in an Xcode project as for the actual testing of features). I’ve never been religious about testing, particularly given that GUIs have long proven difficult to make fully testable, but with a testing framework buit into Xcode (not everyone’s favorite, but it’s a start), it’s well worth rethinking how I could use it to get some measure of test coverage and fight regressions.

All of this makes me worry about the status of the iPhone SDK Development book I wrote with Bill Dudney. That was an iPhone OS 2 book that slipped far enough to be an early iPhone OS 3 book, with the addition of new chapters for important new frameworks like Core Data and Game Kit. But with iOS 5 surely looming, some of it is starting to look pretty crusty. In particular, the arrival of Grand Central Dispatch means that means that it’s no longer safe to blithely ignore threads, as we did, since there are scenarios where you can have even simple code that unwittingly manages to get off the main thread, which means trouble for UIKit. Furthermore, new frameworks demand blocks for completion handlers, so that’s something that now needs to appear early (and given that the block syntax is pure C, readers will need to be acclimated to C earlier than they used to). And I’ve long wanted to move the debugging and performance chapters (my favorites, actually) much earlier, so readers can figure out their own EXC_BAD_ACCESS problems. Not that I can currently even plan on a rev to that book – I still have four chapters to go on Core Audio, and would need a long and difficult conversation with the Prags besides. But I certainly see where my guidance to new developers has changed, significantly, in the last few years.

Betweeen Christmas break, a week of CodeMash prep and home office reorganization, and CodeMash itself, I feel like I’ve been off for a month (and my MYOB First Edge financial status would seem to agree). I feel ready to start anew this week, and making a clean break with the past suits this mood nicely.

The Dark Depths of iOS

CodeMash starts Wednesday in Sandusky, with half-day iOS tutorials from Daniel Steinberg and myself, followed by two days of sessions. My Thursday session is The Dark Depths of iOS, and is a rapid-fire tour of the non-obvious parts of the iOS APIs.

Researching for this has proven an interesting exercise. I had the idea for the talk from the occasional dive into Core Foundation for functionality that is not exposed at higher levels of the iOS and Mac stacks. A simple example of this is the CFUUID, the Universally unique identifier defined by RFC 4122. More than once, I’ve needed an arbitrary unique ID (other than, say, the device ID), and am happy to use the industry standard. But it’s not defined in Foundation or Cocoa, so you need to use Core Foundation and its C API.

Another example I knew about before starting this talk was the CFNetwork sub-framework, which provides a much more complete networking stack than is available in Cocoa’s URL Loading System. CFNetwork allows you make arbitrary socket connections, work with hosts (e.g., to do DNS lookups), accept connections, etc. Basically, if what you need from the network can’t be expressed as a URL, you need to drop down at least to this level. It has an advantage over traditional BSD sockets in that it integrates with the Core Foundation view of the world, most importantly in that its reading and writing APIs use the asynchronous callback design patterns common to Apple’s frameworks, rather than blocking as the traditional C APIs would.

Stuff like those, along with Accelerate and Keychain, are things I knew I wanted to talk about, and did the first few slides on Core Media and Core Services by pointing out reasonably findable information from Apple’s architecture docs.

The eye-openers for me were down in the “System” level of Core OS… the various C APIs that iOS inherits from its open-source roots in FreeBSD and NetBSD (i.e., Darwin). These aren’t substantially documented in Xcode (though they do enjoy syntax highlighting and sometimes code-completion). The documentation for these is in the man pages, which of course makes sense for long-time Unix programmers, but maybe less so today… if I’m writing in Xcode for a separate iOS device, the Mac command-line is a rather counterintuitive location to look for API documentation, isn’t it?

So the trick with calling the standard C libraries is finding out just what’s available to you. Apple has links to iOS Manual Pages that collect a lot of these APIs in one place, but they are largely unordered (beyond the historical Unix man page sections), and being programmatically generated, Apple can only offer a warning that while some APIs known to be unavailable on iOS have been filtered out, the list isn’t guaranteed to be completely accurate. There’s also a consideration that some of these APIs, while they may exist, are not particularly useful given the limitations under which third-party iOS applications operate. For example, all the APIs involving processes (e.g., getting your PID and EUID) and inter-process communication are presumably only of academic interest — the iPhone is not the campus timeshare mainframe from 1988. Similarly, ncurses is probably not going to do much for you on a touch display. OK, maybe if you’re writing an ssh client. Or if you really need to prove that Angry Birds could have worked on a VT100.

Another way of figuring out what’s there — if less so how to actually call it — is to go spelunking down in <iOS_SDK>/usr/lib and <iOS_SDK>/usr/include to get an idea of the organization and packaging of the standard libraries. Had I not done this, I might not have realized that there is a C API for regular expression matching (regex.h), XML parsing with libxml2 (both DOM and SAX), zip and tar, MD5 and SHA (in CommonCrypto/) and other interesting stuff.

On the other hand, there are cases where code libraries are available but don’t have public headers. For example, libbz2.dylib are libtidy.dylib are in usr/lib but don’t seem to have corresponding entries in usr/include. That begs the question of whether you could call into these BZip and Tidy libraries and remain “App Store safe”, given the apparent lack of a “public” API, even though you could easily get the headers for these open-source libraries from their host projects. Heck, kebernet pointed out to me that tidy.h and bzlib.h are available in the iPhoneSimulator.platform path, just not iPhoneOS.platform.

It would be nice if there were better visibility into these libraries, though their nature as unrepentant C probably scares off a lot of developers, who will be just as well off scouring Google Code for a nice Objective-C alternative (provided the license they find is compatible with their own). My takeaway is that there’s even more functionality at these low levels than I expected to find. I’ll probably at least consider using stuff like libxml2 and regex.h in the future.

From iPod Library to PCM Samples in Far Fewer Steps Than Were Previously Necessary

In a July blog entry, I showed a gruesome technique for getting raw PCM samples of audio from your iPod library, by means of an easily-overlooked metadata attribute in the Media Library framework, along with the export functionality of AV Foundation. The AV Foundation stuff was the gruesome part — with no direct means for sample-level access to the song “asset”, it required an intermedia export to .m4a, which was a lossy re-encode if the source was of a different format (like MP3), and then a subsequent conversion to PCM with Core Audio.

Please feel free to forget all about that approach… except for the Core Media timescale stuff, which you’ll surely see again before too long.

iOS 4.1 added a number of new classes to AV Foundation (indeed, these were among the most significant 4.1 API diffs) to provide an API for sample-level access to media. The essential classes are AVAssetReader and AVAssetWriter. Using these, we can dramatically simplify and improve the iPod converter.

I have an example project, VTM_AViPodReader.zip (70 KB) that was originally meant to be part of my session at the Voices That Matter iPhone conference in Philadelphia, but didn’t come together in time. I’m going to skip the UI stuff in this blog, and leave you to a screenshot and a simple description: tap “choose song”, pick something from your iPod library, tap “done”, and tap “Convert”.

Screenshot of VTM_AViPodReader

To do the conversion, we’ll use an AVAssetReader to read from the original song file, and an AVAssetWriter to perform the conversion and write to a new file in our application’s Documents directory.

Start, as in the previous example, by using the valueForProperty:MPMediaItemPropertyAssetURL attribute to get an NSURL representing the song in a format compatible with AV Foundation.



-(IBAction) convertTapped: (id) sender {
	// set up an AVAssetReader to read from the iPod Library
	NSURL *assetURL = [song valueForProperty:MPMediaItemPropertyAssetURL];
	AVURLAsset *songAsset =
		[AVURLAsset URLAssetWithURL:assetURL options:nil];

	NSError *assetError = nil;
	AVAssetReader *assetReader =
		[[AVAssetReader assetReaderWithAsset:songAsset
			   error:&assetError]
		  retain];
	if (assetError) {
		NSLog (@"error: %@", assetError);
		return;
	}

Sorry about the dangling retains. I’ll explain those in a little bit (and yes, you could use the alloc/init equivalents… I’m making a point here…). Anyways, it’s simple enough to take an AVAsset and make an AVAssetReader from it.

But what do you do with that? Contrary to what you might think, you don’t just read from it directly. Instead, you create another object, an AVAssetReaderOutput, which is able to produce samples from an AVAssetReader.


AVAssetReaderOutput *assetReaderOutput =
	[[AVAssetReaderAudioMixOutput 
	  assetReaderAudioMixOutputWithAudioTracks:songAsset.tracks
				audioSettings: nil]
	retain];
if (! [assetReader canAddOutput: assetReaderOutput]) {
	NSLog (@"can't add reader output... die!");
	return;
}
[assetReader addOutput: assetReaderOutput];

AVAssetReaderOutput is abstract. Since we’re only interested in the audio from this asset, a AVAssetReaderAudioMixOutput will suit us fine. For reading samples from an audio/video file, like a QuickTime movie, we’d want AVAssetReaderVideoCompositionOutput instead. An important point here is that we set audioSettings to nil to get a generic PCM output. The alternative is to provide an NSDictionary specifying the format you want to receive; I ended up doing that later in the output step, so the default PCM here will be fine.

That’s all we need to worry about for now for reading from the song file. Now let’s start dealing with writing the converted file. We start by setting up an output file… the only important thing to know here is that AV Foundation won’t overwrite a file for you, so you should delete the exported.caf if it already exists.


NSArray *dirs = NSSearchPathForDirectoriesInDomains 
				(NSDocumentDirectory, NSUserDomainMask, YES);
NSString *documentsDirectoryPath = [dirs objectAtIndex:0];
NSString *exportPath = [[documentsDirectoryPath
				 stringByAppendingPathComponent:EXPORT_NAME]
				retain];
if ([[NSFileManager defaultManager] fileExistsAtPath:exportPath]) {
	[[NSFileManager defaultManager] removeItemAtPath:exportPath
		error:nil];
}
NSURL *exportURL = [NSURL fileURLWithPath:exportPath];

Yeah, there’s another spurious retain here. I’ll explain later. For now, let’s take exportURL and create the AVAssetWriter:


AVAssetWriter *assetWriter =
	[[AVAssetWriter assetWriterWithURL:exportURL
		  fileType:AVFileTypeCoreAudioFormat
			 error:&assetError]
	  retain];
if (assetError) {
	NSLog (@"error: %@", assetError);
	return;
}

OK, no sweat there, but the AVAssetWriter isn’t really the important part. Just as the reader is paired with “reader output” objects, so too is the writer connected to “writer input” objects, which is what we’ll be providing samples to, in order to write them to the filesystem.

To create the AVAssetWriterInput, we provide an NSDictionary describing the format and contents we want to create… this is analogous to a step we skipped earlier to specify the format we receive from the AVAssetReaderOutput. The dictionary keys are defined in AVAudioSettings.h and AVVideoSettings.h. You may find you need to look in these header files to look for the value types to provide for these keys, and in some cases, they’ll point you to the Core Audio header files. Trial and error led me to ultimately specify all of the fields that would be encountered in a AudioStreamBasicDescription, along with an AudioChannelLayout structure, which needs to be wrapped in an NSData in order to be added to an NSDictionary



AudioChannelLayout channelLayout;
memset(&channelLayout, 0, sizeof(AudioChannelLayout));
channelLayout.mChannelLayoutTag = kAudioChannelLayoutTag_Stereo;
NSDictionary *outputSettings =
[NSDictionary dictionaryWithObjectsAndKeys:
	[NSNumber numberWithInt:kAudioFormatLinearPCM], AVFormatIDKey, 
	[NSNumber numberWithFloat:44100.0], AVSampleRateKey,
	[NSNumber numberWithInt:2], AVNumberOfChannelsKey,
	[NSData dataWithBytes:&channelLayout length:sizeof(AudioChannelLayout)],
		AVChannelLayoutKey,
	[NSNumber numberWithInt:16], AVLinearPCMBitDepthKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsNonInterleaved,
	[NSNumber numberWithBool:NO],AVLinearPCMIsFloatKey,
	[NSNumber numberWithBool:NO], AVLinearPCMIsBigEndianKey,
	nil];

With this dictionary describing 44.1 KHz, stereo, 16-bit, non-interleaved, little-endian integer PCM, we can create an AVAssetWriterInput to encode and write samples in this format.


AVAssetWriterInput *assetWriterInput =
	[[AVAssetWriterInput assetWriterInputWithMediaType:AVMediaTypeAudio
				outputSettings:outputSettings]
	retain];
if ([assetWriter canAddInput:assetWriterInput]) {
	[assetWriter addInput:assetWriterInput];
} else {
	NSLog (@"can't add asset writer input... die!");
	return;
}
assetWriterInput.expectsMediaDataInRealTime = NO;

Notice that we’ve set the property assetWriterInput.expectsMediaDataInRealTime to NO. This will allow our transcode to run as fast as possible; of course, you’d set this to YES if you were capturing or generating samples in real-time.

Now that our reader and writer are ready, we signal that we’re ready to start moving samples around:


[assetWriter startWriting];
[assetReader startReading];
AVAssetTrack *soundTrack = [songAsset.tracks objectAtIndex:0];
CMTime startTime = CMTimeMake (0, soundTrack.naturalTimeScale);
[assetWriter startSessionAtSourceTime: startTime];

These calls will allow us to start reading from the reader and writing to the writer… but just how do we do that? The key is the AVAssetReaderOutput method copyNextSampleBuffer. This call produces a Core Media CMSampleBufferRef, which is what we need to provide to the AVAssetWriterInput‘s appendSampleBuffer method.

But this is where it starts getting tricky. We can’t just drop into a while loop and start copying buffers over. We have to be explicitly signaled that the writer is able to accept input. We do this by providing a block to the asset writer’s requestMediaDataWhenReadyOnQueue:usingBlock. Once we do this, our code will continue on, while the block will be called asynchronously by Grand Central Dispatch periodically. This explains the earlier retains… autoreleased variables created here in convertTapped: will soon be released, while we need them to still be around when the block is executed. So we need to take care that stuff we need is available inside the block: objects need to not be released, and local primitives need the __block modifier to get into the block.


__block UInt64 convertedByteCount = 0;
dispatch_queue_t mediaInputQueue =
	dispatch_queue_create("mediaInputQueue", NULL);
[assetWriterInput requestMediaDataWhenReadyOnQueue:mediaInputQueue 
										usingBlock: ^ 
 {

The block will be called repeatedly by GCD, but we still need to make sure that the writer input is able to accept new samples.


while (assetWriterInput.readyForMoreMediaData) {
	CMSampleBufferRef nextBuffer =
		[assetReaderOutput copyNextSampleBuffer];
	if (nextBuffer) {
		// append buffer
		[assetWriterInput appendSampleBuffer: nextBuffer];
		// update ui
		convertedByteCount +=
			CMSampleBufferGetTotalSampleSize (nextBuffer);
		NSNumber *convertedByteCountNumber =
			[NSNumber numberWithLong:convertedByteCount];
		[self performSelectorOnMainThread:@selector(updateSizeLabel:)
			withObject:convertedByteCountNumber
		waitUntilDone:NO];

What’s happening here is that while the writer input can accept more samples, we try to get a sample from the reader output. If we get one, appending it to the writer output is a one-line call. Updating the UI is another matter: since GCD has us running on an arbitrary thread, we have to use performSelectorOnMainThread for any updates to the UI, such as updating a label with the current total byte-count. We would also have to do call out to the main thread to update the progress bar, currently unimplemented because I don’t have a good way to do it yet.

If the writer is ever unable to accept new samples, we fall out of the while and the block, though GCD will continue to re-run the block until we explicitly stop the writer.

How do we know when to do that? When we don’t get a sample from copyNextSampleBuffer, which means we’ve read all the data from the reader.


} else {
	// done!
	[assetWriterInput markAsFinished];
	[assetWriter finishWriting];
	[assetReader cancelReading];
	NSDictionary *outputFileAttributes =
		[[NSFileManager defaultManager]
			  attributesOfItemAtPath:exportPath
			  error:nil];
	NSLog (@"done. file size is %ld",
		    [outputFileAttributes fileSize]);
	NSNumber *doneFileSize = [NSNumber numberWithLong:
			[outputFileAttributes fileSize]];
	[self performSelectorOnMainThread:@selector(updateCompletedSizeLabel:)
			withObject:doneFileSize
			waitUntilDone:NO];
	// release a lot of stuff
	[assetReader release];
	[assetReaderOutput release];
	[assetWriter release];
	[assetWriterInput release];
	[exportPath release];
	break;
}

Reaching the finish state requires us to tell the writer to finish up the file by sending finish messages to both the writer input and the writer itself. After we update the UI (again, with the song-and-dance required to do so on the main thread), we release all the objects we had to retain in order that they would be available to the block.

Finally, for those of you copy-and-pasting at home, I think I owe you some close braces:


		}
	 }];
	NSLog (@"bottom of convertTapped:");
}

Once you’ve run this code on the device (it won’t work in the Simulator, which doesn’t have an iPod Library) and performed a conversion, you’ll have converted PCM in an exported.caf file in your app’s Documents directory. In theory, your app could do something interesting with this file, like representing it as a waveform, or running it through a Core Audio AUGraph to apply some interesting effects. Just to prove that we actually have performed the desired conversion, use the Xcode Organizer to open up the “iPod Reader” application and drag its “Application Data” to your Mac:

Accessing app's documents with Xcode Organizer

The exported folder will have a Documents, in which you should find exported.caf. Drag it over to QuickTime Player or any other application that can show you the format of the file you’ve produced:

QuickTime Player inspector showing PCM format of exported.caf file

Hopefully this is going to work for you. It worked for most Amazon and iTunes albums I threw at it, but found I had an iTunes Plus album, Ashtray Rock by the Joel Plaskett Emergency, whose songs throw an inexplicable error when opened, so I can’t presume to fully understand this API just yet:


2010-12-12 15:28:18.939 VTM_AViPodReader[7666:307] *** Terminating app
 due to uncaught exception 'NSInvalidArgumentException', reason:
 '*** -[AVAssetReader initWithAsset:error:] invalid parameter not
 satisfying: asset != ((void *)0)'

Still, the arrival of AVAssetReader and AVAssetWriter open up a lot of new possibilities for audio and video apps on iOS. With the reader, you can inspect media samples, either in their original format or with a conversion to a form that suits your code. With the writer, you can supply samples that you receive by transcoding (as I’ve done here), by capture, or even samples you generate programmatically (such as a screen recorder class that just grabs the screen as often as possible and writes it to a movie file).

This Thing I Do!

It’s been an interesting couple months with all these versions of iOS. Until last week, most of us had to juggle one legacy version (3.x), a production version (4.0 or 4.1), and the important 4.2 beta, which ushered iPad into the modern age. If you’re working on code for production, you likely needed to focus on at least two of these, possibly all three.

The Xcode tools have a reasonable means of letting you do this, by installing to locations other than the default /Developer with a few caveats, such as that a single set of the System Tools and the UNIX stuff can be present on a volume, and must be in specific locations.

My standard is to put the legacy SDK (currently 3.2) at /Developer_Old, the current production release at /Developer, and the latest beta at /Developer_New. So far so good. But what happens when you double-click an .xcodeproj or .xib… which version do you get? Do you notice or care? Probably not until you find out that the target SDK in the project file is unsupported by the version you launched. And what if you run multiple copies at once… what version of IB or the Simulator is that on your dock?

After fooling myself one time too many, I adopted a system of changing the icons of the core tools to give myself a visual indication of which SDK I’m working with. This way, I can start a specific version of Xcode right from my dock:
Aliases to different versions of Xcode

Here’s a quick look at how you can set this up for yourself.

Basically, what you need to do is to create alternate versions of the .icns files of the apps you’re interested in. For me, that’s Xcode, Interface Builder, and the iOS Simulator. I’ve created two folders, in /Developer/Xcode and simulator icons to store these .icns files permanently.:

Futuristic versions of icons
Old-timey versions of icons

As you can see, I’ve got one folder called Futuristic for the icons for betas, and another called Old Timey for the legacy SDKs.

To make these icons, start by visiting an application’s installed location in the Finder:

  • Xcode – /Developer/Applications/Xcode.app
  • Interface Builder – /Developer/Applications/Interface Builder.app
  • iOS Simulator – /Developer/Platforms/iPhoneSimulator.platform/Developer/iOS Simulator.app

Now, peek into the application by right-clicking/ctrl-clicking the app and choosing “Show package contents”, then open the “Resources” folder. You need to find the appropriate .icns file:

  • Xcode – appicon.icns
  • Interface Builder – InterfaceBuilder_App.icns
  • iOS Simulator – simulator.icns

Don’t double-click the .icns; that will probably just open Preview. Instead, make a copy and use “Open With…” (via File menu or right-click/ctrl-click contextual menu) to open up Icon Composer.
Interface Builder icon in Icon Composer

This shows the icon at the various canned sizes: 512×512, 256×256, etc. Select one of these and copy with cmd-c.

Now open an image editor… I’ve used both Pixelmator and Acorn for this kind of stuff. You don’t have to buy Photoshop. For this blog, let’s assume you’re using Acorn. Over there, do “New from cilpboard” to create a new document of the image you’ve copied from the icon. Now choose an image effect that will colorize the icon in some meaningful way, without changing its size or shape. In my case, for the old-timey look of legacy SDKs, I applied the “Sepia Tone” effect:
Applying Sepia Tone effect to Interface Builder icon

And for the futuristic look to distinguish beta SDKs, I used the Quartz Composer “thermal camera” effect:
Interface Builder icon with thermal camera effect

Copy this image, close it, go back to Icon Composer, and paste it in. Repeat for all the other sizes. When finished, use “Save As…” to save the .icns file to some safe place (I use /Developer/Xcode and simulator icons).

Now, re-visit the legacy or beta applications whose icons you want to change, and option-drag your hacked .icns file into the Resources folder, replacing the default icon (you might want to do this on a copy of the application the first few times, just to make sure you’re doing it right).

Now, when you launch these apps, you’ll be able to tell immediately which is which in the Dock.
Three different versions of Xcode running side-by-side
You can also collect aliases to each version in a single folder, then put that in the Dock (as I did with Xcode in the first screenshot) for a one-stop shop to launch the version you need.

The Other iOS Programming Language

My latest contract project has me doing a bunch of custom work with a UIWebView: we have XHTML content that we want to render in our app, but with some fairly extensive changes to its presentation, such as paginating the content like a book, and intercepting taps on links. Given the option of using and customizing the built-in WebKit rendering, versus parsing the XHTML myself, laying it out, etc., the choice was a no-brainer.

The trick, then, is in how to extend and customize the functionality. As a long-time curly-brace application developer, my natural instinct is to impose control from the Cocoa side, perhaps by subclassing UIWebView to achieve custom behavior (although this is specifically discouraged by the documentation), tying in delegates where possible, perhaps even employing some render hackery (like using an offscreen UIWebView and then blitting its pixels into some visible view). But this really isn’t the right way to do it: for starters, it still gives you no access to the DOM, which is where nearly all the value of your HTML content is.

I suspect younger readers already know what the right answer is: insert your own JavaScript, and work inside the UIWebView. This is pretty straightforward to do — you can load the HTML source into a string and then munge it as necessary, such as by strategically adding references to your own CSS stylesheets or JavaScript (.js) files, and then load that modified source into the UIWebView, along with an appropriate base URL to help resolve relative paths. I say this would be a natural conclusion for younger developers because I suspect that most young developers start with either Flash or JavaScript, as these environments deliver immediate visual results and aren’t hard to get into (plus, JavaScript is free). Developers my age sometimes wish that computers still came with an introductory programming environment like a flavor of BASIC or HyperCard, overlooking the fact that today’s dominant starter language is included with every browser.

What makes JavaScript programming practical in an iOS app is the method -[UIWebView stringByEvaluatingJavaScriptFromString:], which does exactly what the wordy method name says: it tells a UIWebView to execute arbitrary JavaScript contained in an NSString parameter, and return the result as an NSString. You can easily try it out on a UIWebView in your own application like so:


	[myWebView stringByEvaluatingJavaScriptFromString:
		@"alert ("hello JavaScript");"];

With this door opened between the JavaScript and Cocoa worlds, you have two-way access to the DOM and how it is rendered. For example, you can pull out the web view’s selected text by writing a simple JavaScript function, and calling it from Cocoa. Or slurp it all in by walking the DOM, appending all the textContent, and returning one big NSString. Or collect all the links with document.elementsByTagName('a') and index them in Cocoa.

But don’t stop there. With the richness of JavaScript, you can employ all the tricks you see in rich web applications, such as rewriting the DOM on the fly, scrolling around programmatically and finding element coordinates, etc. Plus, since you’re only running against one pseudo-browser (iOS’ built-in WebKit framework), you don’t have to work around the incompatibilities and quirks of multiple browsers like regular web developers do.

As I’ve worked on this project, I’ve settled into a “Render unto Caesar…” strategy. Meaning that anytime I need to access contents of the DOM, or change how it is rendered, I know I’m writing a JavaScript function, because that’s what owns the web content and its rendering. For the rest of the app, it’s still Cocoa.

There are hard parts, not the least of which is that fact that I’m still pretty green when it comes to JavaScript and messing around with the DOM. Mozilla Dev Center is a very useful resource for this, despite some search and link breakage, far more so than Apple’s Safari Dev Center.

The other problem is that debugging JavaScript in a UIWebView is notoriously difficult. Unlike desktop browsers, there is no “developer mode” you can drop into in order to inspect elements, see the results of your actions, or even log messages efficiently. Worse, the smallest syntax error will break execution of your function (and major syntax errors can break your whole .js file), so there is often little recourse but to jam alert() calls into your code just to see how far the interpreter gets before quietly dying. That said, iOS Safari behaves very much like its desktop equivalent, so it is possible to do some amount of development and debugging in desktop Safari’s developer mode, or the WebKit nightly build, before switching back to the UIWebView. That said, I only do this for extreme cases of debugging — I wouldn’t want to develop a bunch of new code against WebKit nightly only to find it doesn’t work the same way on the production version of iOS.

Anyone with a browser knows how rich web applications can be, and so you implicitly know that anything you can do in Safari can also be done inside a UIWebView. Sometimes it makes sense to do exactly that.

Slides and stuff from Voices That Matter talk

I wish I hadn’t been so crunched in the week leading up to the Voices That Matter: iPhone Developer Conference last weekend in Philadelphia, and had gotten a few of the super-advanced AV Foundation features working for demos, but since I went over my time by 10 minutes, I guess the talk was already chock ful o’ content.

Anyways, I promised materials would be on my blog after a code clean-up, and here they are:

Title: Mastering Media with AV Foundation

  • Presentation slides (PDF)
  • VTM_Player.zip – Illustrates basic playback functionality, with local and remote files and streams (included URLs include .m4a, .mov, Shoutcast, and HTTP LIve Streaming)
  • VTM_AVRecPlay.zip – Performs A/V capture from camera/mic and playback of captured movie
  • VTM_AVEditor.zip – Simple cuts-only editor for in/out editing, addition of audio track at export time.