I've been working with etree for the last couple of weeks, firstly using xpaths to extract data from a HR database a customer expects my company to look up manually for 6000 people, then to create OpenDocument files. It's really easy, even if you've never done any XML programming before, and the mailinglist is very helpful too.
There are Perl packages to do the same - XML::TreeBuilder and HTML::TreeBuilder come to mind. I'm sure for most cases they would both work just fine and it's really just a matter of preference.
I agree - wasn't trying to use that as an an illustration of superiority of Python, but just show how the data should be properly parsed, not treated like unstructured data.
That said, I do prefer the 'one, right way to do it' of Python - it makes it a lot easier to pick up on someone else's code.
In Python, load lxml module and parse the content into an etree, a heirarchical data structure for XML elements.
lxml.etree will read them in and treat them as Comment or ProcessingInstruction.
You can then iterate the tree and collect the comments.
http://codespeak.net/lxml/tutorial.html
I've been working with etree for the last couple of weeks, firstly using xpaths to extract data from a HR database a customer expects my company to look up manually for 6000 people, then to create OpenDocument files. It's really easy, even if you've never done any XML programming before, and the mailinglist is very helpful too.