<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>bytesizecreations &#187; XML</title>
	<atom:link href="http://www.bytesizecreations.com/category/xml/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.bytesizecreations.com</link>
	<description>Musings from a busy mind</description>
	<lastBuildDate>Sun, 14 Jun 2009 17:44:00 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>iPhone XML Management</title>
		<link>http://www.bytesizecreations.com/2008/11/iphone-xml-management/</link>
		<comments>http://www.bytesizecreations.com/2008/11/iphone-xml-management/#comments</comments>
		<pubDate>Thu, 27 Nov 2008 06:36:00 +0000</pubDate>
		<dc:creator>Michael Gaylord</dc:creator>
				<category><![CDATA[JTidy]]></category>
		<category><![CDATA[NSXMLParser]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[iphone]]></category>
		<category><![CDATA[java]]></category>
		<category><![CDATA[libxml2]]></category>
		<category><![CDATA[objective-c]]></category>
		<category><![CDATA[programming]]></category>
		<category><![CDATA[screen scraping]]></category>

		<guid isPermaLink="false">http://www.42restaurants.com/byte/?p=46</guid>
		<description><![CDATA[Learning how to program in Objective-C and for the iPhone can be really frustrating sometimes. Although I am coming to grips with the language and its frameworks, I am finding certain seemingly simple tasks a bit of a chore.
For instance, as one of my own projects to help me learn Cocoa Touch, I am trying [...]]]></description>
			<content:encoded><![CDATA[<p><a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_eTNQ8DJPMbw/SS7_UJk4RHI/AAAAAAAAAWE/kOlsn5wdFYk/s1600-h/IMG_2686.png"><img style="float:left; margin:0 10px 10px 0;cursor:pointer; cursor:hand;width: 235px; height: 320px;" src="http://1.bp.blogspot.com/_eTNQ8DJPMbw/SS7_UJk4RHI/AAAAAAAAAWE/kOlsn5wdFYk/s320/IMG_2686.png" border="0" alt="" name="BLOGGER_PHOTO_ID_5273432935330497650" /></a>Learning how to program in Objective-C and for the iPhone can be really frustrating sometimes. Although I am coming to grips with the language and its frameworks, I am finding certain seemingly simple tasks a bit of a chore.</p>
<p>For instance, as one of my own projects to help me learn Cocoa Touch, I am trying to do some parsing of HTML off of a website &#8211; screenscraping. To handle any kind of XML processing there are, to my knowledge, only two libraries/utilities you can use. One is the dreaded <a href="http://www.xmlsoft.org/">libxml2</a> and the other is <a href="http://developer.apple.com/documentation/Cocoa/Reference/Foundation/Classes/NSXMLParser_Class/Reference/Reference.html">NSXMLParser</a>.</p>
<p>Firstly, libxml2. What a nightmare! Albeit that it is reported to be very memory efficient and fast, it is written in C, has the most confusing documentation on the planet &#8211; and to top it all off, the classes are named according to a different set of coding standards from the ones I am used to. This makes all sample code, that I could find, very difficult to read.</p>
<p>What I did find however, was a couple of utility classes written by <a href="http://cocoawithlove.com/2008/10/using-libxml2-for-parsing-and-xpath.html">Marcus Zarra</a> that makes XPath querying with libxml2 a bit easier. But not easy enough.</p>
<p>Which brought me to NSXMLParser. If you are going to use NSXMLParser to screenscrape you are asking for spaghetti code. NSXMLParser is a SAX-style parser. In other words, while parsing through an XML document it fires events on the start of an element, when it finds characters on an element and at the end of an element. This works very well if you know what the XML is going to look like but it is not ideal when working with HTML.</p>
<p>In the end, after fiddling for a few hours and many curse words I got my code to work with NSXMLParser. Even though it looked like a dog&#8217;s breakfast and would probably break the moment an extra tag was added to the HTML.</p>
<p>There is one other thing, however, that you can do, create a proxy for all the data that you send to the phone. This serves two purposes. The first, is that you can use Java running on a web server to fetch, clean and extract the data from the website you are trying to scrape. I used a combination of <a href="http://jtidy.sourceforge.net/">JTidy</a> and <a href="http://www.w3.org/TR/xpath">XPath</a> to extract the data I needed from the relevant pages and convert it to objects which I can now serve to the iPhone. This gives an incredible amount of flexibility and allows for a marked improvement in performance as the phone does not need to load large documents into memory in the background.</p>
<p>The other major benefit of using a proxy server is that if you offload processing to an external server, and you make your iPhone application as dumb as possible you won&#8217;t need to update and resubmit your iPhone application to Apple, within reason of course. You can even resize images on the server and cache responses.</p>
<p>My application now uses NSXMLParser to work with simple XML files and I have a lot of control. Moral of the story, don&#8217;t try and do everything on the phone.</p>
<p>
So what&#8217;s next? Well I am going to finish my server-side code. Rewrite my iPhone application so that it works with the simpler XML from the server. Once all this is done I can start working on the more fun parts of my application, such as using the media frameworks for the iPhone namely the camera, video and audio.</p>
]]></content:encoded>
			<wfw:commentRss>http://www.bytesizecreations.com/2008/11/iphone-xml-management/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
