As you mention, RegExps are the wrong way to handle markup. In Python, load lxml...

dangoldin · on July 9, 2008

There are Perl packages to do the same - XML::TreeBuilder and HTML::TreeBuilder come to mind. I'm sure for most cases they would both work just fine and it's really just a matter of preference.

nailer · on July 9, 2008

I agree - wasn't trying to use that as an an illustration of superiority of Python, but just show how the data should be properly parsed, not treated like unstructured data.

That said, I do prefer the 'one, right way to do it' of Python - it makes it a lot easier to pick up on someone else's code.