I’m getting a lot of hits on Sunday’s App Rejections Are A Lousy Way to Communicate Policy Changes, courtesy of a link from Daring Fireball, which is pretty magnanimous considering I kind of slam Gruber halfway through. That blog argued that suddenly rejecting apps for accessing the UDID is a terrible way to handle a policy change, that Apple’s usually far better about controlling the message, and that developers shouldn’t hear about this stuff from Twitter and TechCrunch.
Unfortunately, a lot of the conversation here and elsewhere has been hijacked by a focus on the UDID specifically and privacy issues in general, largely by uninformed readers who are quick to blame everyone who didn’t see this coming.
I’m going to use this blog to explain what the UDID is and the problems around it, from the POV of a developer who’s both chosen to use it and inherited code that uses it.
Take this with the grain of salt that I Am Not A Security Expert (IANASE), or perhaps I Am Not Graham Lee (IANGL).
What Is the UDID?
The Unique Device Identifier is exactly what it says on the tin: a long string of characters that uniquely identifies one iOS device. It only identifies the device, not the person using it (I believe it stays the same after a device wipe, but I don’t feel like wiping my iPad to test that). Any user can find his or her UDID by connecting their device to iTunes and option-clicking the Serial Number.
Since the first public SDK in iPhone OS 2.0, the UDID has been available to developers by calling
-[UIDevice uniqueIdentifier]. The reason you would want to do so is typically to identify one instance of your app running in the wild, or perhaps as a stub to create other unique IDs (for example, a document-creating app that IDs each document as UDID plus the time it was created).
Imagine you want to know how often your users use your app. You could set up a URL to log startups, and then hit it from your app. But if you got 30 hits, you wouldn’t know if that was 30 devices running your app once, or one device running it 30 times. If each call sends the UDID, then you’d be able to tell.
So What’s The Problem
Most of us would agree that this scenario is pretty benign. And in fact, many apps gather a lot more metrics than this — what features are used most, which ones are used together, etc.
Let’s set aside for the moment of whether gathering usage metrics is OK (with or without the user’s permission). So far, this doesn’t seem bad. In fact, we still don’t know who’s running the app — no matter how much information we collect on a given UDID, all we know is that one device exists that is being used in that way.
Let’s say we have 9 apps that collect metrics like this, each phoning home metrics to the developer or some third-party’s server. Now what if a 10th app does the same thing, but it for some reason is able to collect personal information (maybe it’s subscription based, maybe it uses a Facebook login, whatever). That app is clearly able to correlate metrics to a given user. But since it shares the same UDID with the other 9 apps, we can now associate the activities in those apps with a specific individual.
Can you say “unintended consequences”? This is, as I understand it, the problem with the UDID, and why Apple has deprecated its use.
They Should Have Known!
No, the issue is that no one should have thought it was ok to use information that could identify or otherwise compromise another person’s privacy. It should have been obvious that once this got out Apple would need to do something about it. It should have been obvious that you should not have been doing it at all.
Icouldseeitcoming, commenting on App Rejections Are A Lousy Way To Communicate Policy Changes
My point in the above story is that the potential for misuse of the UDID does not come from the string itself. Remember, it conveys no personally identifying information. The problem has arisen over time, in an unexpected way, based on many apps working together (not necessarily with their explicit intent or permission).
Was Apple negligent in offering access to the string in the first place? All indications are that they actually gave privacy some serious thought as they developed the iPhone SDK. Consider the Address Book. No, not the Path debacle (we’ll get to that). Over on Mac OS X, there is a method called
-[ABAddressBook me] (also the C function
ABGetMe()) that returns the user’s address book card. In the C-based iPhone Address Book API, there is no equivalent call to get the user’s record. Clearly, Apple did put some thought into what third-party developers should and shouldn’t have access to, and decided that being able to personally identify the user of the phone was a bad idea. Despite benign uses this prevents, many of us would consider this a good decision.
To that end, let me tell you another story…
Apple’s guidance in the iOS 5 deprecation statement for the
-[UIDevice uniqueIdentifier] call is to generate a
CFUUID and persist that in the
NSUserDefaults. I’d be inclined to use the Keychain so it survives app deletion and reinstalls, but same difference. The upshot is, the one app that just needs to log app launches still has what it needs: a unique identifier of one instance of the app. When each app creates its own CFUUID, the 10 apps in our above example phone home with 10 different unique identifiers, so one can’t be used to compromise the identity of another. So far, so good.
So what is a
CFUUID? It’s Apple’s C API for working with Universally Unique Identifiers (UUIDs). From Apple’s docs:
UUIDs (Universally Unique Identifiers), also known as GUIDs (Globally Unique Identifiers) or IIDs (Interface Identifiers), are 128-bit values guaranteed to be unique. A UUID is made unique over both space and time by combining a value unique to the computer on which it was generated—usually the Ethernet hardware address—and a value representing the number of 100-nanosecond intervals since October 15, 1582 at 00:00:00.
So that’s good, we can’t trace the IDs back to the… hey, wait a minute, what was that? Combining the network hardware address (aka, the MAC address) with the time the UUID was created? Doesn’t that mean that a group of UUIDs created on the same device would all have the same MAC address? So if you can get that MAC address out of the UUID, then even though the 10 apps that phone home 10 different UUIDs, we could get the MAC address that’s common to all of them, identify the user and we’re right back where we started! What the hell? Rabble! Rabble!
Well, as it turns out, Apple’s documentation is wrong. Look back at the Wikipedia entry again and notice that UUIDs have versions. The description above is for version 1, which was abandoned for exactly this reason:
This scheme has been criticized in that it is not sufficiently “opaque”; it reveals both the identity of the computer that generated the UUID and the time at which it did so.
Read in and we find that there is a version digit in the UUID, the character after the second hyphen. Now click to enlarge the screenshot below, from a sample app I showed at CocoaConf Chicago, and which generates hundreds or thousands of UUIDs a second in a (futile) search for duplicates:
Notice that the digit after the hyphen is always a 4, meaning these are version 4 UUIDs, which are just really huge pseudo-random numbers, and therefore don’t reveal anything about who or what created them. Problem solved.
The reason I bring this up is that it was not at all “obvious” that putting the MAC address in the UUID was a bad idea, at least not so bad that it held up ratification of the standard (or, alternately, it wasn’t a problem until UUIDs started being used for purposes where revealing the creator’s identity was an issue). Lots of smart people worked on this stuff; it wasn’t thought to be a problem until later.
Like bugs, security problems are not things people create on purpose, and it’s insulting to insinuate that. They show up later, after reconsideration, after systems evolve, after third-party attacks.
You Want Evil? I’ll Show You Evil!
Yeah, you guys are right. How could anyone have expected that something called a “unique device identifier” might be used to track people.
Icouldseeitcoming, commenting on App Rejections Are A Lousy Way To Communicate Policy Changes
The UDID is neither necessary nor sufficient to track users. An app could use a hand-rolled UUID to track a device, as is Apple’s recommendation, and is in a position to log every tap and keystroke and phone it home to the analytics server. Heck, I had an AV Foundation demo a few years ago that encoded screen-grabs to a QuickTime movie — it wouldn’t take much more to provide a live video stream of a user’s interaction with my app back to an IP address of my choice, all without the user’s knowledge or assent.
That’s assuming I’m being evil on purpose of course. What removing the UDID hopes to improve is cases where apps inadvertently provide a way to correlate activity in different apps, which in turn could be linked to a real person if any of the apps capture personally-identifying information. Even now, there are other ways this could be done: apps could share hand-rolled UUIDs via URLs, document exchange, or the Keychain (if the apps were all signed with the same credentials and share a bundle id stub), though this requires a far greater degree of deliberate cooperation than the inadvertent UDID case.
There are other APIs with privacy implications too, such as free access to the Address Book (as epitomized by the Path case a few weeks back, and Facebook years earlier). An app that incorporates a
So to make a big deal out of the UDID as this obvious privacy problem seems badly ignorant. Its problematic aspects come from unintended consequences, not the nature of what it is.
And since any app can itself collect any information on your interactions with it, UDID or not, and phone it home with a network connection, the only perfect way to avoid being tracked at all is to go to the Settings application, turn on Airplane Mode, and never turn it off.