develooper Front page | perl.perl5.porters | Postings from June 2000

Re: [ID 20000626.002] Slow s/// in 5.6 compared to 5.00403?

Thread Previous | Thread Next
From:
M.J.T. Guy
Date:
June 28, 2000 10:06
Subject:
Re: [ID 20000626.002] Slow s/// in 5.6 compared to 5.00403?
Message ID:
E137LIK-0000FD-00@libra.cus.cam.ac.uk
Per Lindquist <per.lindquist@dynamics.saab.se> wrote
> Hi, I'm not sure if this should be called a bug but it's a problem
> anyway.

A 20x slowdown is definitely a bug.   In the past, proposed Perl
upgrades have been rejected because they caused a 3% overhead.

> The 's///' statement seems to take a lot more time in version 5.6.0 than
> in version 5.00403, at least in some cases. The execution time has
> increased more than 20 times (an in-house script takes 20 minutes
> instead of one, which is quite annoying).

I also observe this, on several platforms.    Here's a cut down version
of your test case which shows the effect:

    use re 'debug';
    $_ = "P FIRST = 0\nP " . join(" = 0\nP ", 1..5) . " = 0\nP LAST = 0\n";
    /(.*)LAST(.*=).*/;

It is (reasonably) fast on perl5.005_03, but slow on perl5.6.0 and
indeed with the latest development perl-current.

Note that your regular expression is rather inefficient (but that doesn't
excuse any bug).    The sequence of events will be:

Starting at the beginning of the string, it will match 'P FIRST = 0'
against '.*', then'll be stopped by the "\n".   It'll then fail to
match 'LAST', so will backtrack to match 'P FIRST =' against '.*', fail,
 ... until it's matched '' against '.*'.   Now it's failed to match at
the start of the string, so it'll move forward to match ' FIRST = 0'
against '.*', fail and backtrack as previously ..., trying 'FIRST = 0'
 ... .    So it's doing a quadratic amount of work before even
getting to the first newline.

Now it'll repeat this searching for each 'P MIDDLE = 0' in turn, till
it eventually finds the 'P LAST = 0' and succeeds.    Whew!


You can reduce this quadratic threshing to linear by restricting the
'.*' matches to immediately after newlines, by adding a ^ anchor and
adding the /m flag:

 $file =~ s/^(.*)LAST(.*=).*/$1LAST$2foo;/m
            ^                             ^

This bypasses the bug, and is a good thing to do in any case.   Also
replace the '.*' by somethinmg more restrictive if you know what you are
expecting to match.   (In the test case, that would be 'P ', but
presumably your real program is doing something more complex.)


Back to the bug:   the debug output shows that in perl5.005_03, the
RE engine is indeed behaving as described above.    But the output
under perl5.6.0 shows that instead of the quadratic rummage over
'P FIRST = 0', it's doing a cubic search.    Dunno why.   The effect
goes away if the first pair of brackets are made non-capturing:

     /(?:.*)LAST(.*=).*/;

I'll leave it to someone else to dig further.


Mike Guy

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About