develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Karl Williamson
Date:
April 30, 2011 14:22
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
4DBC7D08.7060808@khwilliamson.com
On 04/30/2011 01:43 PM, Karl Williamson wrote:
> On 04/30/2011 01:12 PM, Tom Christiansen wrote:
>> Isn't 0xDF and "SS" *the* big problem? I don't think the others are
>> troublesome, are they? What about not generating multichar folds in
>> charclasses that contain nothing over 255? Or would that be resurrecting
>> the Unicode Bug?
>>
>> --tom
>>
>
> I think you're right that all or nearly all existing code that's going
> to get broken will be over ß and ss.
>
> The Unicode Bug is about utf8 vs non-utf8 encoding having different
> semantics, so no, this wouldn't be resurrecting it. But it is kind of
> like the Unicode bug, where addition of a new character to the class
> would suddenly change the behavior of the class for non-obvious and not
> really related reasons.
>
> I would prefer a more uniform approach of what I've said before, or we
> just exclude this one code point always for 5.14. But I think your
> approach is much better than releasing 5.14 as-is.
>
> (And BTW, in 5.16 I think it would be something like "use re folding X"
> where X is one of "simple" "full" "nfd", nfkd, etc.)
>



In thinking about this some more, given the bug that Nicholas found that 
affects all multi-character folds, not just \xdf,  in character classes, 
I think it would be best to just not offer any of them in 5.14.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About