Gerard Goossen skribis 2007-02-08 16:49 (+0100): > use strict; Whoops! You got me there! > If you fix you script you will see that the unicode matching is a lot slower. Then my point still stands. > But it is a lot slower not because the matching is in unicode. But because Perl 5, > has to do a lot to make sure all string are unicode, for example E probably > has to upgraded to latin1. There is no upgrading to latin1. AFAIK, Perl never downgrades automatically. Can anyone confirm or negate this? > If you turn of mixing latin1 and unicode matching, things get a _lot_ > simpler and you can do better optimalizations. Which was part of my proposal: upgrade both the string and the pattern to UTF8 (if necessary), and then do naive byte matching. This should be explicitly enabled, because it causes havoc if you're not aware of the internals. Optimizations like this are very nice to have, but should only be used in extreme cases. Any use of such an optimization (unless it can safely be done automatically) is probably premature. > my branch: Unfortunately, you use a similar thing by default. If I understand correctly, your branch does UTF8, not Unicode. This is a bit like PHP's mb_ functions. Real Perl does Unicode, while internally encoding it as UTF8. > When refering to my branch, I will do so explicit (by saying something > like my branch, my patch). Thanks for clarifying that. I was confused by your reference to \x[]. -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.