Easy PeasyUnless you've been hiding in a cave for the last few years, you've heard about XML - it's the toolkit that more and more Web publishers are switching to for content markup. You may even have seen an XML document in action, complete with user-defined tags and markup, and you might have wondered how on earth one converts that tangled mess of code into human-readable content. The answer is, not easily. While PHP has included support for the two standard methods of parsing (read: making sense of) XML - SAX and DOM - since version 4.0, the complexity and inherent geekiness of these methods often turned off all but the most dedicated XML developers. All that has changed, however, with PHP 5.0, which introduces a brand-spanking-new XML extension named SimpleXML that takes all (and I do mean all) the pain out of processing XML documents. Keep reading, and find out how. The Bad Old DaysIn order to understand why SimpleXML is so cool, a brief history lesson is in order. In the days before SimpleXML, there were two ways of processing XML documents. The first, SAX or the Simple API for XML, involved traversing an XML document and calling specific functions as the parser encountered different types of tags. For example, you might have called one function to process a starting tag, another function to process an ending tag, and a third function to process the data between them. The second, DOM or the Document Object Model, involved creating a tree representation of the XML document in memory, and then using tree-traversal methods to navigate it. Once a particular node of the tree was reached, the corresponding content could be retrieved and used. Neither of these two approaches was particularly user-friendly: SAX required the developer to custom-craft event handlers for each type of element encountered in an XML file, while the DOM approach used an object-oriented paradigm which tended to throw developers off, in addition to being memory-intensive and thus inefficient with large XML documents. In the larger context also, PHP 4 used a number of different backend libraries for each of its different XML extensions, leading to inconsistency in the way different XML extensions worked and thus creating interoperability concerns (as well as a fair amount of confusion for developers). With PHP 5.0, a concerted effort was made to fix this problem, by adopting the libxml2 library (http://www.xmlsoft.org/) as the standard library for all XML extensions and by getting the various XML extensions to operate more consistently. The biggest change in the PHP 5 XML pantheon, though, is the SimpleXML extension developed by Sterling Hughes, Rob Richards and Marcus Börger, which attempts to make parsing XML documents significantly more user-friendly than it was in PHP 4. SimpleXML works by converting an XML document into an object, and then turning the elements within that document into object properties which can be accessed using standard object notation. This makes it easy to drill down to an element at any level of the XML hierarchy to access its content. Repeated elements at the same level of the document tree are represented as arrays, while custom element collections can be created using XPath location paths (of which, more later); these collections can then be processed using PHP's standard loop constructs. Accessing element attributes is as simple as accessing the keys of an associative array - there's nothing new to learn, and no special code to write. In order to use SimpleXML and PHP together, your PHP build must include support for SimpleXML. This support is enabled by default in both the UNIX and Windows versions of PHP 5. Read more about this at http://www.php.net/manual/en/ref.simplexml.php. If you're a PHP 4 user, you're out of luck - SimpleXML is only available for PHP 5. Petting ZooTo see how SimpleXML works, consider the following XML file:
Now, you need a way to get to the content enclosed between the
The action begins with the Just as you can read, so also can you write. SimpleXML makes it easy to alter the contents of a particular XML element - simply assign a new value to the corresponding object property. Here's an example:
Here, the original XML file is first read in, and then the character data enclosed within
each element is altered by assigning new values to the corresponding object property. The
Sin CityRepeated elements at the same level of the XML hierarchy are represented as array elements, and can be accessed using numeric indices. To see how this works, consider the following XML file:
Here's the PHP script that reads it and retrieves the data from it:
If you'd prefer, you can even iterate over the collection with a
The Shape Of Things To ComeSimpleXML handles element attributes as transparently as it does elements and their content. Attribute-value pairs are represented as members of a PHP associative array, and can be accessed like regular array elements. To see how this works, take a look at this script:
Unlike previous examples, which used an external XML file, this one creates the XML
dynamically and loads it into SimpleXML with the X Marks The SpotSimpleXML also supports custom element collections, through XPath location paths. For those of you new to XML, XPath is a standard addressing mechanism for an XML document, allowing developers to access collections of elements, attributes or text nodes within a document. Read more about XPath at http://www.w3.org/TR/xpath.html and http://www.melonfire.com/community/columns/trog/article.php?id=83. To see how this works, consider the following XML document:
Now, let's suppose you want to print all the
Using XPath, you can get even fancier than this - for example, by creating a collection
of only those
Without XPath, accomplishing this would be far more complicated than the five lines of code above...try it for yourself and see! An Evening At The Moulin RougeNow that you've seen what XPath can do, let's wrap this up with an example of how you might actually use it. Let's suppose you have a bunch of movie reviews marked up in XML, like this:
Now, you want to display this review on your Web site. So, you need a PHP script to extract the data from this file and place it in the appropriate locations in an HTML template. With everything you've learned so far, this is a snap...as the code below illustrates:
Pretty simple, huh? That's about all for the moment. In Part Twelve of PHP 101, I'll be telling you all about the new exception handling model in PHP 5, showing you how you can use it to catch your scripts before they crash and burn. See you there! |