Archives for : September2011

Three Audio Game Proposals

So, I was reading Scott Steinberg’s Music Games Rock (free PDF and $3 for Kindle or iBooks… how can you not), and it rekindled a bunch of memories not only of great games of my youth and adulthood, but it also kicked loose a few ideas from the dusty cobwebs of memory that had been set aside to think about.

Some of these might be viable, some not, but I’m never going to get around to doing them myself, so why not let them out. Ideas are cheap, execution is everything. Besides, there are one or two novelties in here that I would be pissed to see someone patent — the premise of patenting loose ideas being sickening enough already — so I seldom pass up the opportunity to post some “prior art” when I can.

The common thread here: using the microphone for new gaming experiences. The mic is criminally underutilized, and can do more than just convey insults and slander to fellow gamers across the ether. So here goes…


No, not my idea obviously. The game show dates back to the 60’s, and to the early 80’s in the Alex Trebek incarnation. And since the late 80’s, there have been electronic game versions for computers and game systems. And in all that time, none of them have gotten the one defining trait of the game right: they don’t allow for free spoken-word response.

I get that this hasn’t been practical before, and so the UI had to cope. The first Jeopardy! I played was on the Sega Genesis, where you had to punitively spell out your response one letter at a time with the D-pad and action buttons, trying to remember which button accepted a letter and which entered the whole response. In the early 90’s, the CD-i (of all things!) developed a superior UI where you’d begin to compose a response from a grid of letters on the left side of the screen, and get a list of completions (some irrelevant, and some clearly meant as red herrings) on the right. It’s a good UI scheme: the search function on my DirecTV DVR and Apple TV works exactly this way. And so it’s strange that some subsequent versions of Jeopardy! have back-slid from this sensible approach.

But that was 1995. The CD-i was a 16 MHz machine with 1 MB of RAM. Our phones and consoles are hundreds of times more powerful today. So why in the name of Moore’s Law can nobody release this game in a format that allows the player who rings in to simply speak their answer into a microphone? If the current versions can match partial D-pad answers to plausible completions, and if dictation products can transcribe speech with a high level of accuracy, why can’t these things be combined to take the transcribed speech and match it against the answer set? Sure, it’s harder than that, but we have lots of smart people and lots of CPU cycles.

The Wii version of Jeopardy apparently does use the optional Wii microphone, but reviews point out that in this mode, the answers are multiple choice, which completely changes the nature of the game by taking away the risk and wonder of free response, which is the whole point of the game.

Maybe the smart people who write Kinect games will figure this out, since they seem to be among the most able and willing to advance gaming right now. If they do, I hope they learn one other lesson from the CD-i version: write out the used questions to permanent storage and don’t use those questions again. A single game of Jeopardy uses up 60 questions, so if you start with a database of 2,000 questions, getting repeats after a few games is highly likely unless you’re smart enough to code defensively.

Anyways, getting back to audio…

Code Geass

Lelouch, a young outcast prince of Brittania, possesses two great powers. One of them is “geass”, the absolute ability to compel any person to do whatever he commands…

So begins the prologue to episode 9 of Code Geass, an entirely over-the-top action anime show whose best and worst moments are often one and the same.

The anti-hero is given this ability, “geass”, by which he’s able to use a sort of magical instant hypnosis to force anyone to do his bidding. For example, when he’s running around his school carrying the mask of his alter ego, Zero, and is encountered by students who recognize what they’ve seen, he can say “forget what you’ve just seen” and they do. The limit on this ability is that it can only ever be used once on a given individual.

Now imagine you had an RPG or sneak-em-up action style video game that gave you this ability, via your microphone, to give orders. Cornered by a guard, you could hit the “geass” button and say aloud “return to your post” or even “kill yourself” and have the NPC do exactly that. Now imagine designers getting clever with this ability: you solve a puzzle by telling an enemy who has a key you need “give me the key”. But maybe that leaves you on the wrong side of the level, or sets off an alarm, so instead you need to tell him “unlock this door from the other side”. But maybe you need to have him do two things for you, and you can only use the ability once on him, so, hmmm…

Again, surely a big technical challenge, and not unlike the old Infocom games in needing to parse natural language in a way that won’t seem utterly dense, but now with the added challenge of needing to pick the command out of an audio stream. But big challenges are what make this industry interesting.

Interactive Musicals

True story, and a long one. Back in college, my friend Mike Stemmle wrote his own adventure games, rich in comic book references and Stanford Band in-jokes, using a Mac and an application called World Builder. This ended up leading to him getting a job at LucasArts, back when they were cool and didn’t just whore out Star Wars all day. As part of that process, they called me for a reference on him, and that led to my interviewing there too. I obviously didn’t end up working there, but in interviewing there on two occasions, I distinctly remember two interesting conversations.

The first is when I was talking with Kelly Flock, who headed up the group then (and later got prominent enough at Sony to merit thrashing from Penny Arcade, so that’s saying something…), and he had an interview question about plans they had at that point for doing an Indiana Jones adventure that involved a quest for the philosopher’s stone. My response was that I thought quest stories were usually boring as hell because the object of the quest was usually abstract, unsatisfying, and sometimes an utter macguffin anyways, which meant that the success or failure of the story depended on what happened along the way, what happened in spite of the putative purpose of the quest. Given the premise of getting the philosopher’s stone, I said that the player should actually be able to get it halfway through the game, literally adding it to their inventory, and to use it to solve some puzzle (e.g., to use its power of transmutation to create an item needed to get out of a locked room or something), and perhaps then to lose it again. Not that this was particularly creative of me: using the quest object directly is exactly what happens in Indiana Jones and the Last Crusade when Indy uses the grail to cure his father’s gunshot wounds. But hell, if there was ever a time to steal-don’t-borrow from the greats, this was it.

The other thing that came up in this interview was a concept I had for something called an “interactive musical”. Mike and I had both been writers for Stanford’s Big Game Gaieties student musical, and we always had theatre on the brain. Somehow, it seemed like there was a way to capture the opportunities and the importance of the theatre, and make a player directly experience that. But we didn’t know how to do it then, and over the years we’d occasionally come back to it and say “was this ever something that could work?”

And then today, reading that book on music games, I think I finally figured it out. It’s a simple equation:

Visual Novel + Karaoke Revolution = Interactive Musical

In other words, an interactive musical is a VN where you sing the branch points.

Visualize Karaoke Revolution, or SingStar, or Rock Band for a moment. The pitch and words you’re supposed to sing are on the screen. Well, what if sometimes there was more than one set of words on the screen that fit the music? And you could pick whichever one suited the way you wanted to play the character, just the way you can pick the key lines of dialogue in a VN? And whatever you picked changed the direction of the story? You could woo the girl or tell her off. Your “I Want” song could be heartfelt yearning or bitter disillusionment. You couldn’t have infinitely many options, just enough to make for some different paths through the story, as in VNs.

There are details to work out, like how you know the tune in advance without spoiling the novelty of picking your branch in the moment (I have some ideas about this). And obviously the whole story needs to be something interesting enough to want to play into, since singing demands a real mental and emotional commitment from the player. High school drama nerds notwithstanding, it’s tough to get people to let loose and break into song. This is why karaoke bars sell beer, after all.

This wouldn’t be everyone’s cup of tea… the rest of you are welcome to keep playing Call Of Duty MCMXVII. But if you’re like us theatre geeks, the idea of becoming your character is ever so irresistible. It’s peculiar, but I think in the right hands, the experience could be extraordinary.

So there you have it, three new uses for the microphone: game show free-responses, magical hypnosis of NPCs, and singing for your story. Even if these never pan out, let’s hope more game makers start doing creative things with audio capture. It’s not just there for in-game chat.

Reverse Q&A

I’m a half-day late with my iDevBlogADay post… sorry.

So I was thinking about conference panels recently, something I don’t often attend or participate in. Panels to me seem like something that should work better than they usually do. You have smart, interesting people, but unless they know to “play ball”, to go out of their way to find ways to dig deeper or draw out conflicts and differences between each other, you tend to end up with a lot of head-nodding and personal pet theories that the rest of the panel doesn’t really have a stake in.

It’s not clear that the audience gets a lot out of it either. At Cocoaconf, I was on an iOS developer panel and the first question we got was the hopelessly played out “how do I get my app noticed” one. Ugh. You don’t need a panel for that, we’ve all been griping about that for three damn years now, and if we don’t have good answers yet, we’re never going to. Moreover, I’m not sure that attendees have a good sense of the potential of panels and how they can draw that out.

So here’s a solution. It comes to us by way of the fine folks at Harmonix, makers of Rock Band, Dance Central, the new iOS novelty VidRhythm, the rare iPod nano/Classic game Phase, etc. At their last two panels at PAX, they did a “Reverse Q&A”, which works like this: the panelists either ask big poll-type questions of the room, ask followup questions and get shouted-out responses from the crowd, or they ask “man on the street” style questions to whoever is at the front of the line for the mic. Either way, the topic is then followed up by the panelists and whoever from the crowd happens to be at the front of the line for the mics.

It still seems like a work-in-progress on the Harmonix podcasts, but there is a gem of a great idea here. Anyone who’s working in iOS and attending conferences has something interesting to say, and probably some unique real-world perspectives that wouldn’t necessarily be obvious to the kind of people that get picked for panels. We’re all self-employed hipster indies and authors, so we likely have little if any idea how iOS is playing out in big enterprises, how well or poorly it rubs shoulders with other technologies, etc. So in a Reverse Q&A Panel, I could ask these kinds of questions of whoever is first at the mic: “what do you use iOS for… how’s that working out… what’s missing that you think should be there…”

The responses we would get from the attendees would drive panel discussion, and in a sense, the person at the front of the line for the mic becomes a temporary member of the panel. In this, it’s a lot like the “open chair panel” that I’ve seen pulled off only once (at the Java Mobility conference in Jamuary 2008, where I saw the last gasp of the old world prior to the iPhone SDK announcement a few weeks later).

And I still like the format of both the Reverse Q&A and the Open Chair Panel more than I like straight-up open spaces, which at the end of the day are just chats, and chatting is best done over food and drink, like at the end of Cocoaconf where Bill Dudney, Scott Ruth and I grabbed two guys from Ohio U. that Bill had met and headed down to Ted’s for some bison burgers. That’s chattting. If you’re going to schedule a time and a room, it’s already more formal, and a structure helps set expectations.

I’m inclined to talk up Reverse Q&A as a format to the Cocoaconf and CodeMash organizers… would like to give this a try in the next few months.

And speaking of which, let’s practice. Here are some questions I’d like to ask of Reverse Q&A attendees. Feel free to answer any of them in the comments. I’d like to know what you guys and girls are thinking:

  • Do you learn new platforms, languages, and frameworks from books, blogs, official docs, or what? (I want to know so I can figure out whether I should bother writing books anymore… signs point to no)
  • What do other platforms do better than iOS?
  • What’s the one App Store policy that pisses you off the most?
  • Do you sell your own apps, write apps for someone else (employer, contract clients, etc.) or something else? Which of these do you think makes the most sense for you?
  • Do you want more or less webapps in your life?

OK, you guys and girls talk for a while…

Core Audio: First Draft

Yuna:book cadamson$ svn commit
Sending        book/ch11/Ch11.txt
Sending        book/ch12/Ch12.txt
Transmitting file data ..
Committed revision 237.
Yuna:book cadamson$ 

Folks, that’s the svn commit that marks the completion of the first draft of Core Audio.

How long has it taken to get here? Subversion’s got that too:

Yuna:coreaudiobook cadamson$ svn log -r1 .
r1 | invalidname | 2010-01-06 17:18:06 -0500 (Wed, 06 Jan 2010) | 2 lines

Created the usual trunk, tags, branches.


Yeah, almost two years from when I took over after original author-fade… probably will actually be two years by the time we handle rewrites and comments from reviewers, Lion fixes (AudioComponentInstance instead of ComponentInstance, etc.), and get through the production process of copy-edit, layout, and printing.

Oh, and will you be getting your money’s worth?

Yuna:book cadamson$ wc ch??/Ch??.txt
     208    3972   24817 ch01/ch01.txt
     295    5440   32883 ch02/Ch02.txt
     371    5344   34641 ch03/Ch03.txt
     623    6674   44278 ch04/Ch04.txt
     318    4052   26261 ch05/Ch05.txt
     515    5161   37540 ch06/Ch06.txt
     884   10837   72025 ch07/Ch07.txt
     740    9156   60837 ch08/Ch08.txt
     738    8070   54086 ch09/Ch09.txt
     910    9949   68737 ch10/Ch10.txt
     498    6233   40981 ch11/Ch11.txt
      77    2080   12570 ch12/Ch12.txt
    6177   76968  509656 total
Yuna:book cadamson$ 

At a standard estimate of 250 words per page, this 80,000-word book should clock in around 320 pages. We have a few figures, so maybe it’ll be more like 350. We’ll see. Enough to cover what we thought was crucial, not so big that it could stop a charging bison.

More to do, but nice to have finally reached this milestone. Thanks to everyone who’s waited so patiently for it.

Messin’ with MIDI

I hopped in on the MIDI chapter of the nearly-finished Core Audio book because what we’ve got now is a little obscure, and really needs to address the most obvious questions, like “how do I hook up my MIDI hardware and work with it in code?” I haven’t taken MIDI really seriously in the past, so this was a good chance to catch up.

To keep our focus on iOS for this blog, let’s talk about MIDI support in iOS. iOS 4.2 added CoreMIDI, which is responsible for connecting to MIDI devices via physical cables (through the dock connector) or wifi (on OSX… don’t know if it works on iOS).

Actually getting the connection to work can be touchy. Start with the Camera Connection Kit‘s USB connector. While Apple reps are typically quick to tell you that this is not a general-purpose USB adapter, it’s well-known to support USB-to-MIDI adapters, something officially blessed (with slides!) in Session 411 (“Music in iOS and Lion”) at WWDC 2011.

The catch is that the iPad supplies a tiny amount of power out the dock connector, not necessarily enough to power a given adapter. iOS MIDI keeps an updated list of known-good and known-bad adapters. Price is not a good guide here: a $60 cable from Best Buy didn’t work for me, but the $5 HDE cable works like a charm. The key really is power draw: powered USB devices shouldn’t need to draw from the iPad and will tend to work, while stand-alone cables will work if and only if they eschew pretty lights and other fancy power-draws. The other factor to consider is drivers: iOS doesn’t have them, so compatible devices need to be “USB MIDI Class”, meaning they need to follow the USB spec for driver-less MIDI devices. Again, the iOS MIDI Devices List linked above is going to help you out.

For keys, I used the Rock Band 3 keyboard, half off at Best Buy as they clear out their music game inventory (man, I need to get Wii drums cheap before they become collector’s items). This is only an input device, not an actual synthesizer, so it has only one MIDI port.

Once you’ve got device, cable, and camera connection kit, try playing your keys in GarageBand to make sure everything works.

If things are cool, let’s turn our attention to the Core MIDI API. There’s not a ton of sample code for it, but if you’ve installed Xcode enough times, you likely have Examples/CoreAudio/MIDI/SampleTools/Echo.cpp, which has a simple example of discovering connected MIDI devices. That’s where I started for my example (zip at the bottom of this blog).

You set up a MIDI session with MIDIClientCreate(), and make your app an input device with MIDIInputPortCreate(). Both of these offer callback functions that you set up with a function pointer and a user-info / context that is passed back to your function in the callbacks. You can, of course, provide an Obj-C object for this, though those of you in NDA-land working with iOS 5 and ARC will have extra work to do (the term __bridge void* should not be unfamiliar to you at this point). The first callback will let you know when devices connect, disconnect, or change, while the second delivers the MIDI packets themselves.

You can then discover the number of MIDI sources with MIDIGetNumberOfSources(), get them as MIDIEndpointRef‘s with MIDIGetSource(), and connect to them with MIDIPortConnectSource(). This connects your input port (from the previous graf) to the MIDI endpoint, meaning the callback function specified for the input port will get called with packets from the device.

MIDIPackets are tiny things. The struct only includes a time-stamp, length, and byte array of data. The semantics fall outside of CoreMIDI’s responsibilities; they’re summarized in the MIDI Messages spec. For basic channel voice messages, data is 2 or 3 bytes long. The first byte, “status”, has a high nybble with the command, and a low nybble indicating which MIDI channel (0-16) sent the event. The remaining bytes depend on the status and the length. For my example, I’m interested in the NOTE-ON message (status 0x9n, where n is the channel). For this message, the next two bytes are called “data 1” and “data 2” and represent the rest of the message. The bottom 7 bits of data 1 identify the note as a number (the high bit is always 0), while the bottom 7 bits of data 2 represent velocity, i.e., how hard the key was hit.

So, a suitable callback that only cares about NOTE-ON might look like this:

static void MyMIDIReadProc (const MIDIPacketList *pktlist,
                           void *refCon,
                           void *connRefCon)
   MIDIPacket *packet = (MIDIPacket *)pktlist->packet;	
   Byte midiCommand = packet->data[0] >> 4;
   // is it a note-on
   if (midiCommand == 0x09) {
      Byte note = packet->data[1] & 0x7F;
      Byte veolocity = packet->data[2] & 0x7F;
      // do stuff now...

So what do we do with the data we parse from MIDI packets? There’s nothing in Core MIDI that actually generates sounds. On OSX, we can use instrument units (kAudioUnitType_MusicDevice), which are audio units that generate synthesized sounds in response to MIDI commands. You put the units in an AUGraph and customize them as you see fit (maybe pairing them with effect units downstream), then send commands to the instrument units via the Music Device API, which provides functions like MusicDeviceMIDICommand, and takes the unit, and the status, data1 and data2 bytes from the MIDI packet, along with a timing parameter. Music Device isn’t actually in Xcode’s documentation, but there are adequate documentation comments in MusicDevice.h. On OSX, the PlaySoftMIDI example shows how to play notes in code, so it’d be straight-forward to combine this with CoreMIDI and play through from MIDI device to MIDI instrument: get the NOTE-ON events and send them to the instrument unit of your choice.

On iOS, we don’t currently have instrument units, so we need to do something else with the incoming MIDI events. What I decided to do for my example was to just call System Sounds with various iLife sound effects (which should be at the same location on everyone’s Macs, so the paths in the project are absolute). The example uses 4 of these, starting at middle C (MIDI note 60) and going up by half-steps.

To run the example, you’ll actually have to run it twice: first to put the app on your iPad, then stop, plug in your keyboard, and run again. It might just be easier to watch this demo:


Anyways, that’s a brief intro to CoreMIDI on iOS. The book will probably skew a little more OSX, simply because there’s more stuff to play with, but we’ll make sure both are covered. I’m also going to be getting into this stuff at CocoaHeads Ann Arbor on Thursday night.