On 04/12/2011 08:47 PM, Tom Christiansen wrote: > I erroneously wrote: > >> Does that mean that Perl will do the right thing if I simply say > >> use locale; > >> I don't think it will. > > I was wrong, but there is still something confusing me. > > This shows that use locale has a built-in setlocale: > > % echo $PERL_UNICODE $LANG > S en_US.UTF-8 > > % blead -CS -Mlocale -le 'print "\u\xE9"' > É > % blead -CS -M-locale -le 'print "\u\xE9"' > é > % blead -CS -le 'print "\u\xE9"' > é > % blead -CS -lE 'print "\u\xE9"' > É > I've always believed the documentation that says it doesn't do that, and on my machine the first line prints the lower case, but that could be a problem on my Linux box. I think you have the same problem, that Darwin works and Linux doesn't. Is that right? > But this shows that /u regexes don't work like I would > think they would: > > % blead -le 'print "\xE9" =~ s/(.)/\u$1/r' > é > % blead -Mlocale -le 'print "\xE9" =~ s/(.)/\u$1/r' > É > > But: > > % blead -le 'print "\xE9" =~ s/(\w)/\u$1/lr' > é > % blead -le 'print "\xE9" =~ s/(.)/\u$1/ru' > é > > Drat. It isn't using Unicode case mapping when you use /u. > Is that expected? So /u *isn't* like an automatic > use feature unicode_strings any moreso than /l is (not) a > an automatic use locale? > > I wonder why I keep thinking they are. :( The legalistic answer is that the regex modifier affects only pattern matching. It does not apply to the substitution part. But the truth of the matter is that I never thought about it during the implementation. You could file a bug report. I don't know enough about the areas of Perl involved to know how easy it would be to implement. Here's a case where use locale, and unicode_strings work differently than /l or /u, because they apply to more than pattern matching. I think the docs should be changed to mention this issue, unless we block 5.14 and fix this.Thread Previous | Thread Next