develooper Front page | perl.perl5.porters | Postings from May 2011

Re: Unicode regex negated case-insensitivity in 5.14.0-RC1

Thread Previous | Thread Next
From:
Tom Christiansen
Date:
May 2, 2011 04:23
Subject:
Re: Unicode regex negated case-insensitivity in 5.14.0-RC1
Message ID:
23354.1304335427@chthon
Karl Williamson <public@khwilliamson.com> wrote

>> We cannot know how, when, or even whether The Unicode Consortium
>> is going to change their minds about UTS#18, so that cannot be
>> a factor in any short-term measure

> I disagree here.  It has been my impression that Unicode will not admit 
> error if they can possibly avoid it.  That they are even contemplating 
> this is significant, and so is valid for us to take into consideration.

I meant that since we cannot *in the short-term* know just what
they're going to do, nor when they're going to whatever that
might be, that we cannot possibly do something now about a
condition that we we will not know until then.

Basically, there's no guarantee of what they'll say nor when
they'll say it.

>> Perhaps it would be possible or desirable to emit some sort of warning, and
>> if so, when. Maybe that could accompany some of the hairier choices above.
>> That would directly address the problem of things silently behaving
>> differently, weirdly, or unexpectedly.  That might make more of the
>> possible short-term measures above more acceptable, even option 1.

> I think you must mean, not option 1 which is to make absolutely no 
> changes, but a new option 6) which adds a warning instead of the other 
> things that have been discussed.

Well, yes.  I think it's possible that any of options #1, #2, #3 *might*
become more palatable if some sort of warning could be concocted to alert
the user to these strange circumstances.

It may be in the medium-term that casing will become more like
normalization, in that it will need to be explicitly and manually
arranged for by the user of the regex.  Just as we must now do

    NFD($s)  =~ /pattern/
    NFC($s)  =~ /pattern/
    NFKD($s) =~ /pattern/
    NFCD($s) =~ /pattern/

ourselves, we may end up having to do

    lc($s) =~ /pattern/
    uc($s) =~ /pattern/

instead of 

    $s =~ /pattern/i

This would be unfortunate in various ways, but I begin to wonder 
whether it may be unavoidable.

And we still don't have a good

    UCA1($s) =~ /pattern/
    UCA2($s) =~ /pattern/
    UCA3($s) =~ /pattern/
    UCA4($s) =~ /pattern/

--tom

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About