develooper Front page | perl.perl5.porters | Postings from February 2007

Re: unicode regex performance (was Re: Future Perl development)

From:
Dave Mitchell
Date:
February 8, 2007 08:12
Subject:
Re: unicode regex performance (was Re: Future Perl development)
Message ID:
20070208161248.GB30131@iabyn.com
Just as a data point, the current regex engine does a block memory
comparison for an exact string if:
    * the string and pattern have the same UTF8-ness, and
    * the match is case-sensitive
but does character by character matching otherwise;

ie

fast:

    "X" =~ /XYZ/;
    "\x{100}" =~ /\x{100}\x{101}\x{102}/;

slow:

    "\x{100}" =~ /XYZ/;
    "X" =~ /\x{100}\x{101}\x{102}/;
    "anything" =~ /anything/i;

(Arguably a patten should store both plain and utf8 versions of each
exact string for quicker matching.)


-- 
A walk of a thousand miles begins with a single step...
then continues for another 1,999,999 or so.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About