develooper Front page | perl.fwp | Postings from July 2003

Re: Is this fun?

Thread Previous | Thread Next
From:
A. Pagaltzis
Date:
July 15, 2003 07:00
Subject:
Re: Is this fun?
Message ID:
20030715135951.GA2349@klangraum
* Jon Bjornstad <sahadev@cruzio.com> [2003-07-15 00:00]:
> I don't often get to use the '.' operator or the 'x' operator
> and I thought this was pretty cool.

Pretty standard fare for some people, though I guess the exact
idea of fun is different for everyone. :)

> There are a few flaws with the above approach.

More than those you mention - because it doesn't parse HTML, just
looks for some string bits. It will blow up on

<img alt="<a r g h>" ...>

f.ex.

> Comments?  Other approaches?

If this is supposed to handle arbitrary HTML, not a narrow set of
input you already know it handles correctly, use a (tolerant)
parser. HTML::TokeParser(?:::Simple)? does nicely for jobs like
this.

> Is the s/// operator the only way (of course not!) to get the
> number of occurences of a regexp in a string?

m//g in scalar context is better.


    for my $tag (qw[ol ul b i u a]) {
        my ($opening, $closing) = (0)x2;
        $opening++ while $frag =~ m!<$tag\b!gi;
        $closing++ while $frag =~ m!</$tag\b!gi;
        $frag .= "</$tag>" x ($opening - $closing);
    }

Or maybe more elaborately in a single loop:

    for my $tag (qw[ol ul b i u a]) {
        my ($opening, $closing) = (0)x2;
        until($frag =~ /\G\z/gc) {
            ++$opening if $frag =~ m!\G<$tag\b!gci;
            ++$closing if $frag =~ m!\G</$tag\b!gci;
            $frag =~ m!\G.[^<]*!gc;
        }
        $frag .= "</$tag>" x ($opening - $closing);
        pos($frag) = 0;
    }

At this point we're already closing in on the realm of parsers..
m//gc is how you build lexers in Perl.

-- 
Regards,
Aristotle

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About