develooper Front page | perl.perl5.porters | Postings from June 2010

Re: fold case matching

Thread Previous | Thread Next
From:
karl williamson
Date:
June 3, 2010 12:51
Subject:
Re: fold case matching
Message ID:
4C0807AC.4070802@khwilliamson.com
Dave Mitchell wrote:
> Just out of curiosity, which perl (if any) is doing the Right Thing
> as regards the following code, which matches a char that case folds to two
> chars:
> 
>     # lc("\x{149}") => "\x{2bc}N"
> 
>     print "ok PLAIN 1\n" if "\x{149}" =~ /\x{2bc}/i;
>     print "ok PLAIN 2\n" if "\x{149}" =~ /N/i;
>     print "ok PLAIN 3\n" if "\x{149}" =~ /\x{2bc}N/i;
> 
>     print "ok ALT   1\n" if "\x{149}" =~ /\x{2bc}|ZZZZ/i;
>     print "ok ALT   2\n" if "\x{149}" =~ /N|ZZZZ/i;
>     print "ok ALT   3\n" if "\x{149}" =~ /\x{2bc}N|ZZZZ/i;
> 
> 
> 5.8.0,
> 5.13.0,
> blead:
> 
>     ok PLAIN 3
>     ok ALT   1
>     ok ALT   3
> 
> 5.10.0,
> 5.10.1,
> 5.12.0:
> 
>     ok PLAIN 1
>     ok PLAIN 3
>     ok ALT   1
>     ok ALT   3
> 
> (This is in the context me me trying to understand and fix the trie code
> for [perl #74484] Regex causing exponential runtime+mem usage.)
> 
> 
> 

I looked at this a little bit more, enough to realize that I don't want 
to learn this area of the code unless necessary at some later point. 
Anyway, earlier I wrote that cases 3 were the only ones where it should 
match.  And In blead, the problem that remains is tries, so that ALT 1 
gets matched.

The problem lies in REXEC_TRIE_READ_CHAR or its callers.  They aren't 
calling ibcmp_utf8, and so don't get the benefit of its patch that fixed 
the ALT 1 case in 5.12.  Now I don't know if they should be calling 
ibcmp_utf8, but the bottom line is that they should somehow guarantee 
that a partial character isn't matched.

Maybe it's best to wait for Yves' work on fold matching, but then I seem 
to say a lot that it's the answer to all our problems.  After he 
finishes it, he'll be ready to tackle world hunger, and other trifling 
issues :)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About