develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Karl Williamson
Date:
April 30, 2011 12:45
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
4DBC6678.8090603@khwilliamson.com
On 04/30/2011 01:12 PM, Tom Christiansen wrote:
> Isn't 0xDF and "SS" *the* big problem?  I don't think the others are
> troublesome, are they?  What about not generating multichar folds in
> charclasses that contain nothing over 255?  Or would that be resurrecting
> the Unicode Bug?
>
> --tom
>

I think you're right that all or nearly all existing code that's going 
to get broken will be over ß and ss.

The Unicode Bug is about utf8 vs non-utf8 encoding having different 
semantics, so no, this wouldn't be resurrecting it.  But it is kind of 
like the Unicode bug, where addition of a new character to the class 
would suddenly change the behavior of the class for non-obvious and not 
really related reasons.

I would prefer a more uniform approach of what I've said before, or we 
just exclude this one code point always for 5.14.  But I think your 
approach is much better than releasing 5.14 as-is.

(And BTW, in 5.16 I think it would be something like "use re folding X" 
where X is one of "simple" "full" "nfd", nfkd, etc.)

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About