<?xml version="1.0" encoding="utf-8"?>
<feed xml:lang="en-us" xmlns="http://www.w3.org/2005/Atom"><title>Simon Willison's Weblog: libxml2</title><link href="http://simonwillison.net/" rel="alternate"/><link href="http://simonwillison.net/tags/libxml2.atom" rel="self"/><id>http://simonwillison.net/</id><updated>2008-12-15T00:05:21+00:00</updated><author><name>Simon Willison</name></author><entry><title>How to install lxml python module on mac os 10.5 (leopard)</title><link href="https://simonwillison.net/2008/Dec/15/lxml/#atom-tag" rel="alternate"/><published>2008-12-15T00:05:21+00:00</published><updated>2008-12-15T00:05:21+00:00</updated><id>https://simonwillison.net/2008/Dec/15/lxml/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://lsimons.wordpress.com/2008/08/31/how-to-install-lxml-python-module-on-mac-os-105-leopard/"&gt;How to install lxml python module on mac os 10.5 (leopard)&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
Instructions that work! Finally, I can find out what all the fuss is about.


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/leopard"&gt;leopard&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lxml"&gt;lxml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/macos"&gt;macos&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;



</summary><category term="leopard"/><category term="libxml2"/><category term="lxml"/><category term="macos"/><category term="python"/><category term="xml"/></entry><entry><title>lxml.cssselect</title><link href="https://simonwillison.net/2007/Sep/24/lxmlcssselect/#atom-tag" rel="alternate"/><published>2007-09-24T23:57:17+00:00</published><updated>2007-09-24T23:57:17+00:00</updated><id>https://simonwillison.net/2007/Sep/24/lxmlcssselect/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://codespeak.net/lxml/dev/cssselect.html"&gt;lxml.cssselect&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
lxml includes an implementation of CSS 3 selectors, which compiles them to XPath expressions. Should be a useful tool for parsing Microformats from Python.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://blog.ianbicking.org/2007/09/24/lxmlhtml/"&gt;Ian Bicking&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/css"&gt;css&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/css3"&gt;css3&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/lxml"&gt;lxml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/microformats"&gt;microformats&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/selectors"&gt;selectors&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;&lt;/p&gt;



</summary><category term="css"/><category term="css3"/><category term="libxml2"/><category term="lxml"/><category term="microformats"/><category term="python"/><category term="selectors"/><category term="xpath"/></entry><entry><title>lxml</title><link href="https://simonwillison.net/2005/Apr/13/lxml/#atom-tag" rel="alternate"/><published>2005-04-13T17:04:25+00:00</published><updated>2005-04-13T17:04:25+00:00</updated><id>https://simonwillison.net/2005/Apr/13/lxml/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://codespeak.net/lxml/"&gt;lxml&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
A Pythonic wrapper for libxml2.

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://faassen.n--tree.net/blog/view/weblog/2005/04/08/0"&gt;Martijn Faassen&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;



</summary><category term="libxml2"/><category term="python"/><category term="xml"/></entry><entry><title>PHP 5 Release Candidate 1</title><link href="https://simonwillison.net/2004/Mar/19/PHP5RC1/#atom-tag" rel="alternate"/><published>2004-03-19T01:27:51+00:00</published><updated>2004-03-19T01:27:51+00:00</updated><id>https://simonwillison.net/2004/Mar/19/PHP5RC1/#atom-tag</id><summary type="html">
    &lt;p&gt;I haven't blogged much about &lt;acronym title="PHP: Hypertext Preprocessor"&gt;PHP&lt;/acronym&gt; in a while because I've been up to my nose in mod_python and loving every minute of it. This news is just too important to miss: &lt;acronym title="PHP: Hypertext Preprocessor"&gt;PHP&lt;/acronym&gt; 5 Release Candidate 1 &lt;a href="http://www.php.net/downloads.php#v5" title="PHP 5 Release Candidate 1"&gt;has been released&lt;/a&gt;, bringing the first production-ready release tantilisingly close. While I doubt &lt;acronym title="PHP: Hypertext Preprocessor"&gt;PHP&lt;/acronym&gt; 5 will tempt me back it's definitely an exciting upgrade - my biggest complaint with &lt;acronym title="PHP: Hypertext Preprocessor"&gt;PHP&lt;/acronym&gt; 4 is the brain-dead object model which defaults to copying whole objects rather than passing references, and this is one of the many things  addressed by &lt;acronym title="PHP: Hypertext Preprocessor"&gt;PHP&lt;/acronym&gt; 5. The new libxml2 powered &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; features sound really powerful, and SQLite as an on-board database should be ideal for knocking out small stand-alone applications without needing to set up a mySQL database for them.&lt;/p&gt;

&lt;p&gt;I may well throw a copy on my Mac over the weekend and try out the changes since version 4.3.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/php"&gt;php&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/sqlite"&gt;sqlite&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="libxml2"/><category term="php"/><category term="python"/><category term="sqlite"/><category term="xml"/></entry><entry><title>XML.com: Lightweight XML Search Servers [Jan. 21, 2004]</title><link href="https://simonwillison.net/2004/Jan/26/xmlcom/#atom-tag" rel="alternate"/><published>2004-01-26T14:51:54+00:00</published><updated>2004-01-26T14:51:54+00:00</updated><id>https://simonwillison.net/2004/Jan/26/xmlcom/#atom-tag</id><summary type="html">
    
&lt;p&gt;&lt;strong&gt;&lt;a href="http://www.xml.com/pub/a/2004/01/21/udell.html"&gt;XML.com: Lightweight XML Search Servers [Jan. 21, 2004]&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
More fun with Python and libxml2

    &lt;p&gt;&lt;small&gt;&lt;/small&gt;Via &lt;a href="http://www.pythonware.com/daily/"&gt;Daily Python-URL&lt;/a&gt;&lt;/small&gt;&lt;/p&gt;


    &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;&lt;/p&gt;



</summary><category term="libxml2"/><category term="python"/><category term="xml"/></entry><entry><title>Using XPath to mine XHTML</title><link href="https://simonwillison.net/2003/Oct/21/xpathRocks/#atom-tag" rel="alternate"/><published>2003-10-21T05:31:23+00:00</published><updated>2003-10-21T05:31:23+00:00</updated><id>https://simonwillison.net/2003/Oct/21/xpathRocks/#atom-tag</id><summary type="html">
    &lt;p&gt;This morning, I finally decided to &lt;a href="http://users.skynet.be/sbi/libxml-python/" title="Libxml and Libxslt Python Bindings for Windows"&gt;install libxml2&lt;/a&gt; and see what &lt;a href="http://www.xmldatabases.org/WK/blog/607?t=item" title="Givin libxml2 some love"&gt;all the fuss&lt;/a&gt; was about, in particular with respect to XPath. What followed is best described as an enlightening experience.&lt;/p&gt;

&lt;p&gt;XPath is a beautifully elegant way of adressing "nodes" within an &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; document. XPath expressions look a little like file paths, for example:&lt;/p&gt;

&lt;dl&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;/first/second&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match any &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements that occur inside a &lt;code class="xml"&gt;&amp;lt;first&amp;gt;&lt;/code&gt; element that is the root element of the document&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements irrespective of their place in the document&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second[@hi]&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements with a 'hi' attribute&lt;/dd&gt;
 &lt;dt&gt;&lt;code class="xpath"&gt;//second[@hi="there"]&lt;/code&gt;&lt;/dt&gt;
 &lt;dd&gt;Match all &lt;code class="xml"&gt;&amp;lt;second&amp;gt;&lt;/code&gt; elements with a 'hi' attribute that equals "there"&lt;/dd&gt;
&lt;/dl&gt;

&lt;p&gt;A full &lt;a href="http://www.zvon.org/xxl/XPathTutorial/General/examples.html"&gt;XPath tutorial&lt;/a&gt; is available.&lt;/p&gt;

&lt;p&gt;The Python libxml2 bindings make running XPath expressions incredibly simple. Here's some code that extracts the titles of all of the entries on my Kansas blog from the site's &lt;acronym title="Really Simply Syndication"&gt;RSS&lt;/acronym&gt; feed:&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;&amp;gt;&amp;gt;&amp;gt; import libxml2
&amp;gt;&amp;gt;&amp;gt; import urllib
&amp;gt;&amp;gt;&amp;gt; rss = libxml2.parseDoc(
      urllib.urlopen('http://www.a-year-in-kansas.com/syndicate/').read())
&amp;gt;&amp;gt;&amp;gt; rss.xpathEval('//item/title')
[&amp;lt;xmlNode (title) object at 0xb4b260&amp;gt;, &amp;lt;xmlNode (title) object at 0xa99968&amp;gt;, 
&amp;lt;xmlNode (title) object at 0x10dce68&amp;gt;]
&amp;gt;&amp;gt;&amp;gt; [node.content for node in rss.xpathEval('//item/title')]
['Music and Brunch', 'House hunting', 'Arrival']
&amp;gt;&amp;gt;&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Why is this so exciting? I've been &lt;a href="/2002/Jun/16/myFirstXhtmlMindBomb/" title="My first XHTML mind bomb"&gt;saying&lt;/a&gt; &lt;a href="/2002/Aug/11/benefitsOfXhtml/" title="Benefits of XHTML"&gt;for&lt;/a&gt; &lt;a href="/2003/Jan/06/xhtmlIsJustFine/" title="XHTML is just fine"&gt;over&lt;/a&gt; &lt;a href="/2003/Jan/08/xhtmlIsStillGreatForContent/" title="XHTML is still great for content"&gt;a&lt;/a&gt; &lt;a href="/2003/Aug/03/futureProotContent/" title="XHTML for future-proof content"&gt;year&lt;/a&gt; that &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; is an ideal format for storing pieces of content in a database or content management system. Serving content to browsers as &lt;acronym title="HyperText Markup Language"&gt;HTML&lt;/acronym&gt; 4 makes perfect sense, but storing your actual content as &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; gives you the ability to process that content in the future using &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; tools.&lt;/p&gt;

&lt;p&gt;So far, the best example of a powerful tool for manipulating this stored &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; has been &lt;acronym title="eXtensible Stylesheet Language Transformations"&gt;XSLT&lt;/acronym&gt;. &lt;acronym title="eXtensible Stylesheet Language Transformations"&gt;XSLT&lt;/acronym&gt; has its fans, but is also often criticised as being unintuitive and having a steep learning curve. XPath is a far better example of a powerful, easy to use tool that can be brought to bare on &lt;acronym title="eXtensible HyperText Markup Language"&gt;XHTML&lt;/acronym&gt; content.&lt;/p&gt;

&lt;p&gt;Enough talk, here's an example of what I mean. The following code snippet creates a Python dictionary of all of the acronyms currently visible on the front page of my blog, mapping their shortened version to the expanded text (extracted from the title attribute):&lt;/p&gt;

&lt;pre&gt;&lt;code class="python"&gt;
&amp;gt;&amp;gt;&amp;gt; blog = libxml2.parseDoc(
    urllib.urlopen('http://simon.incutio.com/').read())
&amp;gt;&amp;gt;&amp;gt; ctxt = blog.xpathNewContext()
&amp;gt;&amp;gt;&amp;gt; ctxt.xpathRegisterNs('xhtml', 'http://www.w3.org/1999/xhtml')
0
&amp;gt;&amp;gt;&amp;gt; acronyms = dict([(a.content, a.prop('title')) 
    for a in ctxt.xpathEval('//xhtml:acronym')])
&amp;gt;&amp;gt;&amp;gt; for acronym, fulltext in acronyms.items():
	print acronym, ':', fulltext


DHTML : Dynamic HyperText Markup Language
URL : Universal Republic of Love
HTML : HyperText Markup Language
SIG : Special Interest Group
PHP : PHP: Hypertext Preprocessor
CSS : Cascading Style Sheets
&amp;gt;&amp;gt;&amp;gt; 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above code is slightly more complicated than the first example, as using XPath with a document that uses &lt;acronym title="eXtensible Markup Language"&gt;XML&lt;/acronym&gt; namespaces requires some extra work to register the namespace with the XPath parser. Still, it's a pretty short piece of code considering what it does.&lt;/p&gt;

&lt;p&gt;For an example of how powerful XPath can be on a much larger scale, take a look at Sam Ruby's &lt;a href="http://www.intertwingly.net/blog/1601.html"&gt;XPath enabled blog search feature&lt;/a&gt;.&lt;/p&gt;
    
        &lt;p&gt;Tags: &lt;a href="https://simonwillison.net/tags/libxml2"&gt;libxml2&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/python"&gt;python&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xhtml"&gt;xhtml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xml"&gt;xml&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xpath"&gt;xpath&lt;/a&gt;, &lt;a href="https://simonwillison.net/tags/xslt"&gt;xslt&lt;/a&gt;&lt;/p&gt;
    

</summary><category term="libxml2"/><category term="python"/><category term="xhtml"/><category term="xml"/><category term="xpath"/><category term="xslt"/></entry></feed>