Front page | perl.perl5.porters |
Postings from June 2010
Re: fold case matching
Thread Previous
|
Thread Next
From:
karl williamson
Date:
June 3, 2010 12:51
Subject:
Re: fold case matching
Message ID:
4C0807AC.4070802@khwilliamson.com
Dave Mitchell wrote:
> Just out of curiosity, which perl (if any) is doing the Right Thing
> as regards the following code, which matches a char that case folds to two
> chars:
>
> # lc("\x{149}") => "\x{2bc}N"
>
> print "ok PLAIN 1\n" if "\x{149}" =~ /\x{2bc}/i;
> print "ok PLAIN 2\n" if "\x{149}" =~ /N/i;
> print "ok PLAIN 3\n" if "\x{149}" =~ /\x{2bc}N/i;
>
> print "ok ALT 1\n" if "\x{149}" =~ /\x{2bc}|ZZZZ/i;
> print "ok ALT 2\n" if "\x{149}" =~ /N|ZZZZ/i;
> print "ok ALT 3\n" if "\x{149}" =~ /\x{2bc}N|ZZZZ/i;
>
>
> 5.8.0,
> 5.13.0,
> blead:
>
> ok PLAIN 3
> ok ALT 1
> ok ALT 3
>
> 5.10.0,
> 5.10.1,
> 5.12.0:
>
> ok PLAIN 1
> ok PLAIN 3
> ok ALT 1
> ok ALT 3
>
> (This is in the context me me trying to understand and fix the trie code
> for [perl #74484] Regex causing exponential runtime+mem usage.)
>
>
>
I looked at this a little bit more, enough to realize that I don't want
to learn this area of the code unless necessary at some later point.
Anyway, earlier I wrote that cases 3 were the only ones where it should
match. And In blead, the problem that remains is tries, so that ALT 1
gets matched.
The problem lies in REXEC_TRIE_READ_CHAR or its callers. They aren't
calling ibcmp_utf8, and so don't get the benefit of its patch that fixed
the ALT 1 case in 5.12. Now I don't know if they should be calling
ibcmp_utf8, but the bottom line is that they should somehow guarantee
that a partial character isn't matched.
Maybe it's best to wait for Yves' work on fold matching, but then I seem
to say a lot that it's the answer to all our problems. After he
finishes it, he'll be ready to tackle world hunger, and other trifling
issues :)
Thread Previous
|
Thread Next