Front page | perl.fwp |
Postings from July 2003
Re: Is this fun?
Thread Previous
|
Thread Next
From:
A. Pagaltzis
Date:
July 15, 2003 07:00
Subject:
Re: Is this fun?
Message ID:
20030715135951.GA2349@klangraum
* Jon Bjornstad <sahadev@cruzio.com> [2003-07-15 00:00]:
> I don't often get to use the '.' operator or the 'x' operator
> and I thought this was pretty cool.
Pretty standard fare for some people, though I guess the exact
idea of fun is different for everyone. :)
> There are a few flaws with the above approach.
More than those you mention - because it doesn't parse HTML, just
looks for some string bits. It will blow up on
<img alt="<a r g h>" ...>
f.ex.
> Comments? Other approaches?
If this is supposed to handle arbitrary HTML, not a narrow set of
input you already know it handles correctly, use a (tolerant)
parser. HTML::TokeParser(?:::Simple)? does nicely for jobs like
this.
> Is the s/// operator the only way (of course not!) to get the
> number of occurences of a regexp in a string?
m//g in scalar context is better.
for my $tag (qw[ol ul b i u a]) {
my ($opening, $closing) = (0)x2;
$opening++ while $frag =~ m!<$tag\b!gi;
$closing++ while $frag =~ m!</$tag\b!gi;
$frag .= "</$tag>" x ($opening - $closing);
}
Or maybe more elaborately in a single loop:
for my $tag (qw[ol ul b i u a]) {
my ($opening, $closing) = (0)x2;
until($frag =~ /\G\z/gc) {
++$opening if $frag =~ m!\G<$tag\b!gci;
++$closing if $frag =~ m!\G</$tag\b!gci;
$frag =~ m!\G.[^<]*!gc;
}
$frag .= "</$tag>" x ($opening - $closing);
pos($frag) = 0;
}
At this point we're already closing in on the realm of parsers..
m//gc is how you build lexers in Perl.
--
Regards,
Aristotle
Thread Previous
|
Thread Next