On Thu, Mar 31, 2011 at 10:15:55AM +0200, demerphq wrote:
> On 24 July 2010 17:46, Dave Mitchell <davem@iabyn.com> wrote:
> > The original of this bug report didn't make it to the p5p mailing
> > list; presumably because the original test script included 0.5Mb of sample
> > HTML. The following demonstrates the same slowdown while generating its
> > own sample HTMl data:
> >
> > #!/usr/bin/perl
> > use Time::HiRes qw( time );
> >
> > my $html = qq{<div class="boo">\n} x 30_000;
> >
> > sub try {
> > my ($re) = @_;
> > my $t = time;
> > $html =~ /$re/;
> > warn sprintf "%7.4f sec, regexp is %s\n", time-$t, $re;
> > }
> >
> > try(qr{<div class="marketsort[^>]*(?-i:>)\s*}ms);
> > try(qr{<div class="marketsort[^>]*(?:>)\s*}ims);
> > try(qr{<div class="marketsort[^>]*(?-i:>)}ims);
> > try(qr{<div\sclass="marketsort[^>]*(?-i:>)\s*}ims);
> > try(qr{<div class="marketsort[^>]*(?-i:>)\s*}ims);
> > warn "FINISHED\n";
> >
> > which on blead gives:
> >
> > 0.0004 sec, regexp is (?ms-xi:<div class="marketsort[^>]*(?-i:>)\s*)
> > 0.0041 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?:>)\s*)
> > 0.0048 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?-i:>))
> > 0.0163 sec, regexp is (?msi-x:<div\sclass="marketsort[^>]*(?-i:>)\s*)
> > 61.5914 sec, regexp is (?msi-x:<div class="marketsort[^>]*(?-i:>)\s*)
> > FINISHED
> >
> > The time of the last pattern is quadratic on RHS of 'x' in the $html
> > assignment.
> >
> > I haven't looked into any further than that.
>
> We never fixed the superlinear cache bug that was broken some time
> around when the engine was derecursivized.
>
> Id bet a dollar this is related.
You owe me one dollar ;-)
The superlinear cache is only used with CURLYX/WHILEM op pairs,
which aren't generated with these particular patterns. Perhaps
a case could be made for extending it for other ops as well?
(Off the top of my head I don't how viable that is.)
The particular bug with the superlinear cache was that, until the
derecursivisation [why is my spell-checker complaining???], the cache
cached both success and failure positions; afterwards it only cached
failure positions. At the time I couldn't think of a case that actually
exercised the positive cache; eventually someone found one, but it was
very obscure and was deemed not worth worrying about (it involved
negative lookahead assertions IIRC).
--
Any [programming] language that doesn't occasionally surprise the
novice will pay for it by continually surprising the expert.
-- Larry Wall
Thread Previous
|
Thread Next