develooper Front page | perl.perl5.porters | Postings from April 2014

Re: [perl #58280] Speed lost on /^(foo|bar|baz)$/ match

Thread Previous
From:
demerphq
Date:
April 1, 2014 16:13
Subject:
Re: [perl #58280] Speed lost on /^(foo|bar|baz)$/ match
Message ID:
CANgJU+WB+iJ7za=uqZiJ89J0EXe2gi6n9+CjdFKjiYX4u5htMQ@mail.gmail.com
On 1 April 2014 17:40, Dave Mitchell <davem@iabyn.com> wrote:
> On Sat, Mar 29, 2014 at 08:29:46PM -0700, Karl Williamson via RT wrote:
>> Here are the 5.19.10+ blead numbers on a Linux box:
>>
>>                 Rate     regex regex_opt        eq
>> regex      7650757/s        --       -1%      -84%
>> regex_opt  7717895/s        1%        --      -84%
>> eq        48383994/s      532%      527%        --
>>
>>
>> And if we change $token to 'read', we get
>>
>>                 Rate     regex regex_opt        eq
>> regex      7748530/s        --       -2%      -62%
>> regex_opt  7899967/s        2%        --      -61%
>> eq        20146179/s      160%      155%        --
>
> I think the important thing is how much slower the trie optimisation is for
> simple alternations that wouldn't benefit. Comparing a range of perl
> versions on the same hardware, I get for the original bug report code
> (where $token matches the first alternation):
>
>     5.008009 regex 10526316/s
>     5.010001 regex  3676471/s
>     5.012005 regex  3367003/s
>     5.014004 regex  5665722/s
>     5.016003 regex  6756757/s
>     5.018002 regex  5602241/s
>     5.019010 regex  6211180/s
>
> And here's the same with TRIEs disabled (${^RE_TRIE_MAXBUF} = -1):
>
>     5.008009 regex 10309278/s
>     5.010001 regex  7017544/s
>     5.012005 regex  6993007/s
>     5.014004 regex  6920415/s
>     5.016003 regex  8130081/s
>     5.018002 regex  6451613/s
>     5.019010 regex  6644518/s
>
>
> and with TRIEs enabled and $token as 'read', i.e. $token matches the third
> alternation:
>
>     5.008009 regex  8438819/s
>     5.010001 regex  3809524/s
>     5.012005 regex  3344482/s
>     5.014004 regex  5917160/s
>     5.016003 regex  6711409/s
>     5.018002 regex  5524862/s
>     5.019010 regex  6097561/s
>
> and with TRIEs enabled and with Yves modification, i.e. $token matches the
> 12th alternation:
>
>     5.008009 regex  4807692/s
>     5.010001 regex  3724395/s
>     5.012005 regex  3367003/s
>     5.014004 regex  5952381/s
>     5.016003 regex  6688963/s
>     5.018002 regex  5555556/s
>     5.019010 regex  6042296/s
>
>
> What I conclude from the above is that:

Pretty much the same thing I concluded too. :-)

> * as with all such micro benchmarks, a certain amount of the variation is
>   just due to the luck of the compiler, e.g. instruction cache alignment;
> * simple alternations slow down in proportion to the number of
>   alternatives that are tried (as you would expect);
> * TRIEs match in roughly constant time regardless of which alternation
> * matches (as you would expect);
> * even non-TRIE alternations are slower in later perls, most likely due to
>   all the extra overhead we've accumulated, such as utf8 correctness,
>   not blowing the stack on recursion etc.
> * TRIEs have got faster in 5.14;
> * But TRIEs still have a bigger overhead than plain alts, so it may be
>   worth looking at
>    * profiling to see if the overhead can be reduced;

It definitely can be. Unusually we trade speed for space in the trie
algorithm. Something I now regret. Instead of supporting super large
alphabets and running slow we should have just ignored large alphabets
and run fast.

If I get time I will look at changing the trie/aho code to use the
fast representation. I dont have many tuits however.

>    * seeing whether there's a cut-off point where it's not worth
>      building a TRIE, e.g. if there are less than N alternatives; and
>      there might be other factors such as lengths of the alt strings,
>      whether its utf8 etc.

I also agree here.

Yves

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About