develooper Front page | perl.perl5.porters | Postings from October 2018

Re: [perl #133588] Symbol for 'micro' is erroneously uppercased toGreek 'MU'.

Thread Previous
From:
Tomasz Konojacki
Date:
October 15, 2018 16:09
Subject:
Re: [perl #133588] Symbol for 'micro' is erroneously uppercased toGreek 'MU'.
Message ID:
20181015180928.8C7E.5C4F47F8@xenu.pl
On Sun, 14 Oct 2018 03:48:54 -0700
"G.W. Haywood \(via RT\)" <perlbug-followup@perl.org> wrote:

> The uc() function converts the UTF-8 symbol for 'micro' into an upper
> case Greek 'mu', which is incorrect.  The 'micro' symbol has no upper
> case equivalent and should remain unchanged by uc().
> 
> perl -e 'use feature unicode_strings ; binmode STDOUT, ":encoding(UTF-8)" ; my $txt = "\xce\xbc\xc2\xb5" ; print utf8::decode($txt), "\n" ; print $txt. "=>", uc($txt), "\n"'
> 
> A latin-1 'micro' symbol is also converted by uc() to the UTF-8
> upper-case Greek 'mu' which can result in a string with mixed
> encoding.  Not pretty.
> 
> perl -e 'use feature unicode_strings ; binmode STDOUT, ":encoding(UTF-8)" ; my $txt = shift ; print uc($txt), "\n"' Telecomunica├žoes
> TELECOMUNICAA┬žA?ES
> 
> Behaviour is the same in Perl 5.20 and Perl 5.24 in both cases.

Both Wikipedia[1] and Unicode case tables[2] say that the current
behaviour is correct.

[1] - https://en.wikipedia.org/wiki/Mu_(letter)
[2] - https://www.unicode.org/Public/11.0.0/ucd/UnicodeData.txt (the
third column from the right is Simple_Uppercase_Mapping)

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About