develooper Front page | perl.perl5.porters | Postings from December 2008

[perl #58280] Speed lost on /^(foo|bar|baz)$/ match

yves orton via RT
December 29, 2008 17:13
[perl #58280] Speed lost on /^(foo|bar|baz)$/ match
Message ID:
Given the pattern involved at first look it seems this is related to the
TRIE functionality.

With perl 5.8:

demerphq@dromedary:blead:~/perl$ perl
               Rate     regex regex_opt        eq
regex     2254593/s        --       -3%      -60%
regex_opt 2334045/s        4%        --      -58%
eq        5618599/s      149%      141%        --

With blead:

demerphq@dromedary:blead:~/perl$ ./perl -Ilib
               Rate     regex regex_opt        eq
regex      705456/s        --       -1%      -86%
regex_opt  709993/s        1%        --      -86%
eq        5202675/s      637%      633%        --

This is with the trie optimisation disabled:
demerphq@dromedary:blead:~/perl$ ./perl -Ilib
               Rate     regex regex_opt        eq
regex     1260232/s        --       -1%      -76%
regex_opt 1275183/s        1%        --      -76%
eq        5272268/s      318%      313%        --

From this I assume that the majority of the slowdown comes from the
setup time for doing a match using the new non-recursive process and not
from the TRIE.

So then what happens when we change the token being matched? After all
the benchmark is:


Which is a benchmark which is virtually designed to make the old
alternation implementation look good. So what happens when we switch it
to "read"?

With 5.8 we see the expected slowdown:

demerphq@dromedary:blead:~/perl$ perl
               Rate     regex regex_opt        eq
regex     1846732/s        --      -11%      -23%
regex_opt 2071288/s       12%        --      -14%
eq        2406042/s       30%       16%        --

With blead we see the expected unchanged performance of a trie:

demerphq@dromedary:blead:~/perl$ ./perl -Ilib
               Rate     regex regex_opt        eq
regex      723692/s        --       -1%      -62%
regex_opt  730718/s        1%        --      -62%
eq        1904342/s      163%      161%        --

So now, if we add a bunch more terms to the test:

#!/usr/bin/perl -w

use strict;
use warnings;

use Benchmark qw(cmpthese);

my $token = "read";

cmpthese(shift || -3, {
        regex => sub { $token =~ m/\A (?: open | close | foo | bar | baz
| bop | dizzy | blitzen | rocker | mod | punk | read ) \z/xms; },
        regex_opt => sub { $token =~
m/^(?:open|close|foo|bar|baz|bop|dizzy|blitzen|rocker|mod|punk|read)$/; },
        'eq' => sub {
                        $token eq 'open' ||
                        $token eq 'close' ||
                        $token eq 'foo' ||
                        $token eq 'bar' ||
                        $token eq 'baz' ||
                        $token eq 'bop' ||
                        $token eq 'dizzy' ||
                        $token eq 'blitzen' ||
                        $token eq 'rocker' ||
                        $token eq 'mod' ||
                        $token eq 'punk' ||
                        $token eq 'read';

We see perl 5.8's performance continue to fall off:

demerphq@dromedary:blead:~/perl$ perl
               Rate        eq     regex regex_opt
eq         874033/s        --      -35%      -35%
regex     1348988/s       54%        --       -0%
regex_opt 1353136/s       55%        0%        --

And we see perl 5.10's performance again stay more or less static:

demerphq@dromedary:blead:~/perl$ ./perl -Ilib
              Rate        eq regex_opt     regex
eq        601754/s        --      -14%      -14%
regex_opt 696555/s       16%        --       -1%
regex     703210/s       17%        1%        --

Also, the performance difference of the 'eq' cases suggests that perl as
a whole is a bit slower in perl5.10, this is nothing new afaik, perl has
been getting somewhat slower with each release. 

Anyway, my conclusion is that this isnt really a bug. Its a place where
the changes in the regex engine result in a loss of speed, but it is
cherry picked to do so. Sure we could probably do some work to make this
class of pattern not use the trie logic because the number of options
are so low, and because the pattern is fully anchored, but that isn't
going to save much. The end result will afaict still be half as fast
simply because setting up the non-recursive engine is more expensive
than setting up the recursive one. But of course this comes at a trade
off, the new engine wont SEGV, and the new engine won't see a linear
falloff in performance due to large alternations. So we trade speed for
stability and predictable performance. Its hard to say that is a bug.

I view this more as a notice that we could spend some time making regex
startup less expensive, if we can do so without losing features. The
TRIE aspect IMO in this case is a lesser concern. Although i admit it
could be optimised. The compressed scheme we use is massive overkill for
the type of pattern we have in this bug. It is designed with huge
transition tables in mind, not the more common smaller ones that would
come from the pattern in this bug.

Yves Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About