develooper Front page | perl.perl5.porters | Postings from April 2011

Re: [perl #76546] regex engine slowdown bug

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
April 3, 2011 09:06
Subject:
Re: [perl #76546] regex engine slowdown bug
Message ID:
20110403160636.GA11090@iabyn.com
On Thu, Mar 31, 2011 at 10:15:55AM +0200, demerphq wrote:
> On 24 July 2010 17:46, Dave Mitchell <davem@iabyn.com> wrote:
> > The original of this bug report didn't make it to the p5p mailing
> > list; presumably because the original test script included 0.5Mb of sample
> > HTML. The following demonstrates the same slowdown while generating its
> > own sample HTMl data:
> >
> >    #!/usr/bin/perl
> >    use Time::HiRes qw( time );
> >
> >    my $html = qq{<div class="boo">\n} x 30_000;
> >
> >    sub try {
> >        my ($re) = @_;
> >        my $t = time;
> >        $html =~ /$re/;
> >        warn sprintf "%7.4f sec, regexp is %s\n", time-$t, $re;
> >    }
> >
> >    try(qr{<div class="marketsort[^>]*(?-i:>)\s*}ms);
> >    try(qr{<div class="marketsort[^>]*(?:>)\s*}ims);
> >    try(qr{<div class="marketsort[^>]*(?-i:>)}ims);
> >    try(qr{<div\sclass="marketsort[^>]*(?-i:>)\s*}ims);
> >    try(qr{<div class="marketsort[^>]*(?-i:>)\s*}ims);
> >    warn "FINISHED\n";
> >
> > which on blead gives:
> >
> >     0.0004 sec, regexp is (?ms-xi:<div class="marketsort[^>]*(?-i:>)\s*)
> >     0.0041 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?:>)\s*)
> >     0.0048 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?-i:>))
> >     0.0163 sec, regexp is (?msi-x:<div\sclass="marketsort[^>]*(?-i:>)\s*)
> >    61.5914 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?-i:>)\s*)
> >    FINISHED
> >
> > The time of the last pattern is quadratic on RHS of 'x' in the $html
> > assignment.
> >
> > I haven't looked into any further than that.
> 
> We never fixed the superlinear cache bug that was broken some time
> around when the engine was derecursivized.
> 
> Id bet a dollar this is related.

You owe me one dollar ;-)

The superlinear cache is only used with CURLYX/WHILEM op pairs,
which aren't generated with these particular patterns. Perhaps
a case could be made for extending it for other ops as well?
(Off the top of my head I don't how viable that is.)

The particular bug with the superlinear cache was that, until the
derecursivisation [why is my spell-checker complaining???], the cache
cached both success and failure positions; afterwards it only cached
failure positions. At the time I couldn't think of a case that actually
exercised the positive cache; eventually someone found one, but it was
very obscure and was deemed not worth worrying about (it involved
negative lookahead assertions IIRC).

-- 
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
   -- Larry Wall

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About