develooper Front page | perl.fwp | Postings from July 2003

Re: Is this fun?

Thread Previous | Thread Next
A. Pagaltzis
July 15, 2003 07:00
Re: Is this fun?
Message ID:
* Jon Bjornstad <> [2003-07-15 00:00]:
> I don't often get to use the '.' operator or the 'x' operator
> and I thought this was pretty cool.

Pretty standard fare for some people, though I guess the exact
idea of fun is different for everyone. :)

> There are a few flaws with the above approach.

More than those you mention - because it doesn't parse HTML, just
looks for some string bits. It will blow up on

<img alt="<a r g h>" ...>


> Comments?  Other approaches?

If this is supposed to handle arbitrary HTML, not a narrow set of
input you already know it handles correctly, use a (tolerant)
parser. HTML::TokeParser(?:::Simple)? does nicely for jobs like

> Is the s/// operator the only way (of course not!) to get the
> number of occurences of a regexp in a string?

m//g in scalar context is better.

    for my $tag (qw[ol ul b i u a]) {
        my ($opening, $closing) = (0)x2;
        $opening++ while $frag =~ m!<$tag\b!gi;
        $closing++ while $frag =~ m!</$tag\b!gi;
        $frag .= "</$tag>" x ($opening - $closing);

Or maybe more elaborately in a single loop:

    for my $tag (qw[ol ul b i u a]) {
        my ($opening, $closing) = (0)x2;
        until($frag =~ /\G\z/gc) {
            ++$opening if $frag =~ m!\G<$tag\b!gci;
            ++$closing if $frag =~ m!\G</$tag\b!gci;
            $frag =~ m!\G.[^<]*!gc;
        $frag .= "</$tag>" x ($opening - $closing);
        pos($frag) = 0;

At this point we're already closing in on the realm of parsers..
m//gc is how you build lexers in Perl.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About