Rss

Archives for : September2009

Lazy-ass XML parsing

Late in the development of iPhone SDK Programming, I added a section to the networking chapter on web services, probably the top reader request. The focus of the section was on using NSXMLParser to parse a response from a web service received over the network, in this case the Twitter public timeline.

NSXMLParser is an event-driven parser: it calls back to a delegate as it encounters the beginning or end of each element, text, comment, etc. In the final book, we use a very simplistic delegate to pick off just the elements we care about, ignoring the rest. We went with this approach because an earlier beta of the book adopted the “parse the whole tree” approach suggested by Apple’s Introduction to Event-Driven XML Programming Guide for Cocoa, and the feedback from both editor and readers was that it was too hard and too much work for the sample problem.

And it was, despite one truly nifty technique that Apple provides you: define a custom element class, and as you parse, you pass around the parser’s delegate to each element as it’s being filled in. For example, when you encounter a child element, you init a MyElement object, and then make that new element the new delegate. Similarly, when elements end, you return the delegate to the parent element.

So this is nice, but it’s still kind of heavy. At the moment, I’m parsing XML from a MapQuest result (via their XML protocols), and wanted to try something a little lighter. Moreover, I wanted to be able to get at the parsed data with KVC, so I could just provide a key-path of the form root.child.grandchild. As an experiment, I tried parsing everything into a deeply-nested NSDictionary, which easily supports KVC.

After an hour or two, the idea basically works, though I’ll be the first to tell you this is sloppy code (I’m sure I’m leaking some element-name strings, but neither I nor the Clang Static Analyzer has found them), it loses the order of siblings (which I don’t care about), and it doesn’t yet handle multiple child elements with the same name (which would get into the indexed accessor pattern). Also, the character data is kludged into a pseudo-child called value, whereas using a custom element class would allow you to more carefully distinguish an element’s text, child elements, and attributes.

Basic idea is to keep a master dictionary for the parsed doc, parsedResponseDictionary, the current path being parsed, parseElementPath, and a mutable string for the current element’s character data, currentCharacters, which can arrive over the course of multiple callbacks.

Here are the essential delegate methods:


- (void)parserDidStartDocument:(NSXMLParser *)parser {
	NSLog (@"didStartDocument");
	[parsedResponseDictionary release];
	parsedResponseDictionary = [[NSMutableDictionary alloc] init];
	parseElementPath = @"";
}


- (void)parser:(NSXMLParser *)parser 	didStartElement:(NSString *)elementName 
		namespaceURI:(NSString *)namespaceURI 
		qualifiedName:(NSString *)qName
		attributes:(NSDictionary *)attributeDict {
	NSLog (@"didStartElement:%@", elementName);
	NSMutableDictionary *newElement = [[NSMutableDictionary alloc] init];
	NSMutableDictionary *parent;
	if ([parseElementPath length] == 0) {
		NSLog (@"parent is root");
		parent = parsedResponseDictionary;
	} else {
		NSLog (@"need parent %@", parseElementPath);
		parent = [parsedResponseDictionary valueForKeyPath:parseElementPath];
		// note valueForKeyPath: sted valueForKey:
	}
	[parent setValue:newElement forKey:elementName];
	[newElement release];
	NSString *newParseElementPath = nil;
	if ([parseElementPath length] > 0) {
		newParseElementPath = [[NSString alloc] initWithFormat: @"%@.%@",
			  parseElementPath, elementName];
	} else {
		newParseElementPath = [elementName copy];
	}
	parseElementPath = newParseElementPath;
	NSLog (@"new path is %@", parseElementPath);
}

- (void)parser:(NSXMLParser *)parser didEndElement:(NSString *)elementName 
		namespaceURI:(NSString *)namespaceURI 
		qualifiedName:(NSString *)qName {
	NSLog (@"didEndElement:%@", elementName);
	if (currentCharacters) {
		NSMutableDictionary *elementDict =
			[parsedResponseDictionary valueForKeyPath:parseElementPath];
		[elementDict setValue: currentCharacters forKey: @"value"];
		currentCharacters = nil;
	}
	NSRange parentPathRange;
	parentPathRange.location = 0;
	NSRange dotRange = [parseElementPath
		rangeOfString:@"." options:NSBackwardsSearch];
	NSString *parentParseElementPath = nil;
	if (dotRange.location != NSNotFound) {
		parentPathRange.length = dotRange.location;
		parentParseElementPath =
			[parseElementPath substringWithRange:parentPathRange];
	} else {
		parentParseElementPath = @"";
	}
	parseElementPath = parentParseElementPath;
	NSLog (@"new path is %@", parseElementPath);
}

- (void)parser:(NSXMLParser *)parser foundCharacters:(NSString *)string {
	NSLog (@"foundCharacters");
	if (!currentCharacters) {
		currentCharacters = [[NSMutableString alloc] 
			initWithCapacity:[string length]];
	}
	[currentCharacters appendString:string];
}

Using the sample request from MapQuest’s API docs, the parsed NSDictionary looks like this:


2009-09-29 12:34:40.263 MapQuestThrowaway1[6077:207] parsed dict:
{
    GeocodeResponse =     {
        LocationCollection =         {
            GeoAddress =             {
                AdminArea1 =                 {
                    value = US;
                };
                AdminArea3 =                 {
                    value = PA;
                };
                AdminArea4 =                 {
                    value = Lancaster;
                };
                AdminArea5 =                 {
                    value = Mountville;
                };
                LatLng =                 {
                    Lat =                     {
                        value = "40.044618";
                    };
                    Lng =                     {
                        value = "-76.412124";
                    };
                };
                PostalCode =                 {
                    value = 17554;
                };
                ResultCode =                 {
                    value = B1AAA;
                };
                SourceId =                 {
                    value = ustg;
                };
                Street =                 {
                    value = "[3701-3703] Hempland Road";
                };
            };
        };
    };
}

More importantly for current experimentation purposes, this lets me grab values from the parsed dictionary with KVC-style access:


NSLog (@"key-val test: lat long is %@, %@",
   [parsedResponseDictionary valueForKeyPath:
	@"GeocodeResponse.LocationCollection.GeoAddress.LatLng.Lat.value"],
   [parsedResponseDictionary valueForKeyPath:
	@"GeocodeResponse.LocationCollection.GeoAddress.LatLng.Lng.value"]);

That code produces the desired result:


key-val test: lat long is 40.044618, -76.412124

It’s not pretty, but it’s also not a lot of code, and allows me to get on with getting and processing the result data rather than dancing around with fancy XML parsing for a day or two.

scan-build vs. Xcode 3.2

I’ve tried Xcode 3.2’s Build-and-Analyze tool several times and felt that it was picking up nowhere near as many bugs as running Clang Static Analyzer directly from the command-line does. Suspecting that the integration of CSA into Xcode is a work in progress, I downloaded and installed the latest command-line version (v. 0.219) and tried it out:


** BUILD SUCCEEDED **

scan-build: 23 bugs found.
scan-build: Run 'scan-view /var/folders/ba/baOL2wJxE8aPV3tF1AeZsU+++TI/
-Tmp-/scan-build-2009-09-16-1' to examine bug reports.

Of these 23 (none of which were flagged by Xcode 3.2’s Build-and-Analyze), 9 were honest-to-goodness memory leaks (alloc without release or autorelease), and the other 14 were complaints about my use of old-fashioned rand(), which CSA thumbs its nose at:


warning: Function 'rand' is obsolete because it implements
a poor random number generator.  Use 'arc4random' instead

Anyways, word to the wise: get comfortable with the command-line version of CSA, and keep watching the project’s website. One of the developers there told me they’d eventually be providing info on how to integrate new builds of CSA into Xcode.

BTW, we had instructions on how to set up command-line CSA in most of the betas of the Prags’ iPhone SDK Development book, then pulled it when we saw the analyzer was going into Xcode 3.2. What we’ve done instead is to split that material into a PDF that will be in the downloadable sample code zip (doesn’t look like it’s there yet, though).

Speaking of the book, I’m told it is now at the printer, so those of you who’ve pre-ordered it should expect your copies soon.

Anyways, now to see if the guys at CocoaHeads Ann Arbor were right about being able to build iPhone projects with Clang/LLVM. I thought that only worked for Mac projects right now.

Threads on the Head

Lack of posts lately… heads down on an iPod game. It’s built up of mini-games, about half of which are done. Today, I’m facing the problem of having to create a mini-game that uses some of the metadata in the iPod library that can’t be directly queried. So, I have to go over every song in the library and perform my own analysis.

Obviously, this would be death to at startup or in the middle of the game. Walking my 700-song library takes 6-7 seconds, and users could have far more songs.

Cut to the win: NSOperation makes it easy to do stuff on threads, without having to, you know, write your own pthread stuff.

As a test, I wrote a subclass of NSOperation to perform a simple analysis on the library: count the number of songs that have “the” in the title. Here’s the -main method:


-(void) main {
   NSDate *beginDate = [NSDate date];
   NSLog (@"*** DYDeepLibraryAwarenessOperation is cogitating and ruminating");
   // test - count titles that have the word "the" in them.
   int theCount = 0;
   MPMediaQuery *allsongs = [MPMediaQuery songsQuery];
   NSLog (@"Thinking about %d songs", [allsongs.items count]);
   for (MPMediaItem *item in allsongs.items) {
      NSRange theRange = [[item valueForProperty:MPMediaItemPropertyTitle]
         rangeOfString: @"the" options: NSCaseInsensitiveSearch];
      if (theRange.location != NSNotFound) {
         theCount++;
      }
   }
   NSLog (@"*** %d songs in the iPod Library contain the word "the".", theCount);
   NSLog (@"*** DYDeepLibraryAwarenessOperation has achieved enlightenment (in %f sec).",
         fabs ([beginDate timeIntervalSinceNow]));
}

Then, as the app starts up, the operation is run as part of an NSOperationQueue


awarenessOperation = [[DYDeepLibraryAwarenessOperation alloc] init];
operationQueue = [[NSOperationQueue alloc] init];
[operationQueue addOperation:awarenessOperation];
NSLog (@"DYDeepLibraryAwareness set up NSOperationQueue");

Here’s the output when the code is just left to run by itself (I’ve taken out the date, classname, and line number from the output for space):


15:30:47.979 DYDeepLibraryAwareness set up NSOperationQueue
15:30:47.976 *** DYDeepLibraryAwarenessOperation is cogitating and ruminating
15:30:48.238 Thinking about 740 songs
15:30:54.586 *** 168 songs in the iPod Library contain the word "the".
15:30:54.589 *** DYDeepLibraryAwarenessOperation has achieved enlightenment (in 6.613482 sec).

Perhaps more importantly, and what I can’t show in a blog, is that this other thread does not interfere with the GUI, or with queries to the iPod library from the main thread, which are done to set up and play the first mini-game. So this means that the iPod library server can handle multiple concurrent requests (yay), and that I can do the heavy lifting to set up later games while presenting and playing the simpler ones.