Front page | perl.perl5.porters |
Postings from June 2010
Re: fold case matching
Thread Previous
|
Thread Next
From:
karl williamson
Date:
June 3, 2010 14:25
Subject:
Re: fold case matching
Message ID:
4C081DCD.8050203@khwilliamson.com
karl williamson wrote:
> Dave Mitchell wrote:
>> Just out of curiosity, which perl (if any) is doing the Right Thing
>> as regards the following code, which matches a char that case folds to
>> two
>> chars:
>>
>> # lc("\x{149}") => "\x{2bc}N"
>>
>> print "ok PLAIN 1\n" if "\x{149}" =~ /\x{2bc}/i;
>> print "ok PLAIN 2\n" if "\x{149}" =~ /N/i;
>> print "ok PLAIN 3\n" if "\x{149}" =~ /\x{2bc}N/i;
>>
>> print "ok ALT 1\n" if "\x{149}" =~ /\x{2bc}|ZZZZ/i;
>> print "ok ALT 2\n" if "\x{149}" =~ /N|ZZZZ/i;
>> print "ok ALT 3\n" if "\x{149}" =~ /\x{2bc}N|ZZZZ/i;
>>
>>
>> 5.8.0,
>> 5.13.0,
>> blead:
>>
>> ok PLAIN 3
>> ok ALT 1
>> ok ALT 3
>>
>> 5.10.0,
>> 5.10.1,
>> 5.12.0:
>>
>> ok PLAIN 1
>> ok PLAIN 3
>> ok ALT 1
>> ok ALT 3
>>
>> (This is in the context me me trying to understand and fix the trie code
>> for [perl #74484] Regex causing exponential runtime+mem usage.)
>>
>>
>>
>
> I looked at this a little bit more, enough to realize that I don't want
> to learn this area of the code unless necessary at some later point.
> Anyway, earlier I wrote that cases 3 were the only ones where it should
> match. And In blead, the problem that remains is tries, so that ALT 1
> gets matched.
>
> The problem lies in REXEC_TRIE_READ_CHAR or its callers. They aren't
> calling ibcmp_utf8, and so don't get the benefit of its patch that fixed
> the ALT 1 case in 5.12. Now I don't know if they should be calling
> ibcmp_utf8, but the bottom line is that they should somehow guarantee
> that a partial character isn't matched.
>
> Maybe it's best to wait for Yves' work on fold matching, but then I seem
> to say a lot that it's the answer to all our problems. After he
> finishes it, he'll be ready to tackle world hunger, and other trifling
> issues :)
>
I meant to say, rather, other comparatively easy issues.
Thread Previous
|
Thread Next