Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

People always draw the wrong conclusions in articles like these. He shows a piece of code that uses a regular expression over implicitly-defined variables.

I agree that's confusing; regex syntax is very terse, and there's nothing in there to help you learn it. (In regular code, variable and function names can help you infer from context what the other constructs do.) Secondly, it's hard to reason about code that does things that aren't written. "while(<>)" assigning to "$_" for "/.../" to use is confusing. $2 and $3 coming out of nowhere are confusing.

The problem, though, is not with Perl; it's with the author's lack of knowledge of how it works. Regexes are nice once you learn them. Operating on $_ is nice once you expect that to happen. $1 and friends are quite handy.

Finally, most Perl apps don't look like this. Sure, there may be something like this hidden over in an obscure subroutine, but most Perl looks like every other language.

Here's a randomly sampling of Perl from my git repository:

http://git.jrock.us/?p=Template-Refine.git;a=blob;f=lib/Temp...

Basically, it looks like it would when written in any other language. It's not unreadable or crazy, it's just a program.

Finally, here's a more readable version of his example:

    while ( my $line = <> ){ # or "readline" if you hate <>
        while( $line =~ /<!--(?<comment_body>[^-]+)/g ){
            say $+{comment_body};
        }
    }
One more thing; from the article:

Even so, I supect that most readers that know both languages, even Perl fans, will agree with me.

Not true. I don't know Python, so of course I'm not going to be able to use it to write maintainable code. I do know Perl, though, and I bet I can write much better code than the author of this article.

I should note that his program doesn't actually work. It won't work across lines, and doesn't allow "-" in the comment. A full-file parser would do a correct job, and one is available from the CPAN, so it would involve less code anyway.



As you mention, RegExps are the wrong way to handle markup.

In Python, load lxml module and parse the content into an etree, a heirarchical data structure for XML elements.

lxml.etree will read them in and treat them as Comment or ProcessingInstruction.

You can then iterate the tree and collect the comments.

http://codespeak.net/lxml/tutorial.html

I've been working with etree for the last couple of weeks, firstly using xpaths to extract data from a HR database a customer expects my company to look up manually for 6000 people, then to create OpenDocument files. It's really easy, even if you've never done any XML programming before, and the mailinglist is very helpful too.


There are Perl packages to do the same - XML::TreeBuilder and HTML::TreeBuilder come to mind. I'm sure for most cases they would both work just fine and it's really just a matter of preference.


I agree - wasn't trying to use that as an an illustration of superiority of Python, but just show how the data should be properly parsed, not treated like unstructured data.

That said, I do prefer the 'one, right way to do it' of Python - it makes it a lot easier to pick up on someone else's code.


> He shows a piece of code that uses a regular expression over implicitly-defined variables.

His snippet is perfectly idiomatic perl code and I find it easily legible. His point is that it's pretty hard to parse that without having already spent some hours with the perldocs. Fair enough.


It might be 'perfectly idiomatic' but jrockway's code is much more typical of what a Perl programmer who wished to communicate clear ideas would write.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: