On 1 April 2014 17:40, Dave Mitchell <davem@iabyn.com> wrote: > On Sat, Mar 29, 2014 at 08:29:46PM -0700, Karl Williamson via RT wrote: >> Here are the 5.19.10+ blead numbers on a Linux box: >> >> Rate regex regex_opt eq >> regex 7650757/s -- -1% -84% >> regex_opt 7717895/s 1% -- -84% >> eq 48383994/s 532% 527% -- >> >> >> And if we change $token to 'read', we get >> >> Rate regex regex_opt eq >> regex 7748530/s -- -2% -62% >> regex_opt 7899967/s 2% -- -61% >> eq 20146179/s 160% 155% -- > > I think the important thing is how much slower the trie optimisation is for > simple alternations that wouldn't benefit. Comparing a range of perl > versions on the same hardware, I get for the original bug report code > (where $token matches the first alternation): > > 5.008009 regex 10526316/s > 5.010001 regex 3676471/s > 5.012005 regex 3367003/s > 5.014004 regex 5665722/s > 5.016003 regex 6756757/s > 5.018002 regex 5602241/s > 5.019010 regex 6211180/s > > And here's the same with TRIEs disabled (${^RE_TRIE_MAXBUF} = -1): > > 5.008009 regex 10309278/s > 5.010001 regex 7017544/s > 5.012005 regex 6993007/s > 5.014004 regex 6920415/s > 5.016003 regex 8130081/s > 5.018002 regex 6451613/s > 5.019010 regex 6644518/s > > > and with TRIEs enabled and $token as 'read', i.e. $token matches the third > alternation: > > 5.008009 regex 8438819/s > 5.010001 regex 3809524/s > 5.012005 regex 3344482/s > 5.014004 regex 5917160/s > 5.016003 regex 6711409/s > 5.018002 regex 5524862/s > 5.019010 regex 6097561/s > > and with TRIEs enabled and with Yves modification, i.e. $token matches the > 12th alternation: > > 5.008009 regex 4807692/s > 5.010001 regex 3724395/s > 5.012005 regex 3367003/s > 5.014004 regex 5952381/s > 5.016003 regex 6688963/s > 5.018002 regex 5555556/s > 5.019010 regex 6042296/s > > > What I conclude from the above is that: Pretty much the same thing I concluded too. :-) > * as with all such micro benchmarks, a certain amount of the variation is > just due to the luck of the compiler, e.g. instruction cache alignment; > * simple alternations slow down in proportion to the number of > alternatives that are tried (as you would expect); > * TRIEs match in roughly constant time regardless of which alternation > * matches (as you would expect); > * even non-TRIE alternations are slower in later perls, most likely due to > all the extra overhead we've accumulated, such as utf8 correctness, > not blowing the stack on recursion etc. > * TRIEs have got faster in 5.14; > * But TRIEs still have a bigger overhead than plain alts, so it may be > worth looking at > * profiling to see if the overhead can be reduced; It definitely can be. Unusually we trade speed for space in the trie algorithm. Something I now regret. Instead of supporting super large alphabets and running slow we should have just ignored large alphabets and run fast. If I get time I will look at changing the trie/aho code to use the fast representation. I dont have many tuits however. > * seeing whether there's a cut-off point where it's not worth > building a TRIE, e.g. if there are less than N alternatives; and > there might be other factors such as lengths of the alt strings, > whether its utf8 etc. I also agree here. YvesThread Previous