On 04/12/2011 09:39 PM, Karl Williamson wrote: > On 04/12/2011 08:47 PM, Tom Christiansen wrote: >> I erroneously wrote: >> >>> Does that mean that Perl will do the right thing if I simply say >> >>> use locale; >> >>> I don't think it will. >> >> I was wrong, but there is still something confusing me. >> >> This shows that use locale has a built-in setlocale: >> >> % echo $PERL_UNICODE $LANG >> S en_US.UTF-8 >> >> % blead -CS -Mlocale -le 'print "\u\xE9"' >> É >> % blead -CS -M-locale -le 'print "\u\xE9"' >> é >> % blead -CS -le 'print "\u\xE9"' >> é >> % blead -CS -lE 'print "\u\xE9"' >> É >> > > I've always believed the documentation that says it doesn't do that, and > on my machine the first line prints the lower case, but that could be a > problem on my Linux box. I think you have the same problem, that Darwin > works and Linux doesn't. Is that right? > >> But this shows that /u regexes don't work like I would >> think they would: >> >> % blead -le 'print "\xE9" =~ s/(.)/\u$1/r' >> é >> % blead -Mlocale -le 'print "\xE9" =~ s/(.)/\u$1/r' >> É >> >> But: >> >> % blead -le 'print "\xE9" =~ s/(\w)/\u$1/lr' >> é >> % blead -le 'print "\xE9" =~ s/(.)/\u$1/ru' >> é >> >> Drat. It isn't using Unicode case mapping when you use /u. >> Is that expected? So /u *isn't* like an automatic >> use feature unicode_strings any moreso than /l is (not) a >> an automatic use locale? >> >> I wonder why I keep thinking they are. :( > > The legalistic answer is that the regex modifier affects only pattern > matching. It does not apply to the substitution part. But the truth of > the matter is that I never thought about it during the implementation. > You could file a bug report. I don't know enough about the areas of Perl > involved to know how easy it would be to implement. Here's a case where > use locale, and unicode_strings work differently than /l or /u, because > they apply to more than pattern matching. I think the docs should be > changed to mention this issue, unless we block 5.14 and fix this. > But in thinking about this just a little more, I started to wonder about possible ambiguities. If the /u applies to the whole regex, there isn't an ambiguity, but suppose, s/(...(?u:abc)...)/\U$1/l Here we are uppercasing $1 which partially matched under /l and partially under /u. What should happen here?Thread Previous | Thread Next