develooper Front page | perl.perl5.porters | Postings from April 2011

Re: Proposed update for 5.14 for perlunicode.pod

Thread Previous | Thread Next
From:
Karl Williamson
Date:
April 12, 2011 21:06
Subject:
Re: Proposed update for 5.14 for perlunicode.pod
Message ID:
4DA520D9.3070409@khwilliamson.com
On 04/12/2011 09:39 PM, Karl Williamson wrote:
> On 04/12/2011 08:47 PM, Tom Christiansen wrote:
>> I erroneously wrote:
>>
>>> Does that mean that Perl will do the right thing if I simply say
>>
>>> use locale;
>>
>>> I don't think it will.
>>
>> I was wrong, but there is still something confusing me.
>>
>> This shows that use locale has a built-in setlocale:
>>
>> % echo $PERL_UNICODE $LANG
>> S en_US.UTF-8
>>
>> % blead -CS -Mlocale -le 'print "\u\xE9"'
>> É
>> % blead -CS -M-locale -le 'print "\u\xE9"'
>> é
>> % blead -CS -le 'print "\u\xE9"'
>> é
>> % blead -CS -lE 'print "\u\xE9"'
>> É
>>
>
> I've always believed the documentation that says it doesn't do that, and
> on my machine the first line prints the lower case, but that could be a
> problem on my Linux box. I think you have the same problem, that Darwin
> works and Linux doesn't. Is that right?
>
>> But this shows that /u regexes don't work like I would
>> think they would:
>>
>> % blead -le 'print "\xE9" =~ s/(.)/\u$1/r'
>> é
>> % blead -Mlocale -le 'print "\xE9" =~ s/(.)/\u$1/r'
>> É
>>
>> But:
>>
>> % blead -le 'print "\xE9" =~ s/(\w)/\u$1/lr'
>> é
>> % blead -le 'print "\xE9" =~ s/(.)/\u$1/ru'
>> é
>>
>> Drat. It isn't using Unicode case mapping when you use /u.
>> Is that expected? So /u *isn't* like an automatic
>> use feature unicode_strings any moreso than /l is (not) a
>> an automatic use locale?
>>
>> I wonder why I keep thinking they are. :(
>
> The legalistic answer is that the regex modifier affects only pattern
> matching. It does not apply to the substitution part. But the truth of
> the matter is that I never thought about it during the implementation.
> You could file a bug report. I don't know enough about the areas of Perl
> involved to know how easy it would be to implement. Here's a case where
> use locale, and unicode_strings work differently than /l or /u, because
> they apply to more than pattern matching. I think the docs should be
> changed to mention this issue, unless we block 5.14 and fix this.
>

But in thinking about this just a little more, I started to wonder about 
possible ambiguities.  If the /u applies to the whole regex, there isn't 
an ambiguity, but suppose,

s/(...(?u:abc)...)/\U$1/l

Here we are uppercasing $1 which partially matched under /l and 
partially under /u.  What should happen here?

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About