develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Karl Williamson
Date:
April 29, 2011 10:44
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
4DBAF8AE.7060407@khwilliamson.com
On 04/29/2011 06:13 AM, Nicholas Clark wrote:
> and that what happens is that to [^\xDF] is processed all in one, not as a
> sequence:
>
> a range
>    an inverted range
>      in a case insensitive match
>
>
> so it's not implemented as a human*might*  think, in terms of
>
> * process the ranges inside the [^...] construction to make a list of code
>    points (in my case that's one code point, U+00DF)
> * [^...] means invert the list (in my case, that's several million code points)
> * now match the inverted list against the input string
>    * oh yes, do that insensitively
>

The crux is your "oh yes, do that insensitively".  The word "that" means 
the previous step has to be modified.  The way it currently works for 
cases like this is that it creates the union of the characters not to 
match and their folds, plus a flag that says complement the result at 
execution time, which means the list is essentially all the non-matches. 
A single 's' is not in the list of non-matches, but 'ss' is.  George is 
right, which wins?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About