develooper Front page | perl.perl5.porters | Postings from June 2011

RFC: Handling utf8 locales

Thread Next
From:
Karl Williamson
Date:
June 22, 2011 14:28
Subject:
RFC: Handling utf8 locales
Message ID:
4E025E58.5010103@khwilliamson.com
Perl has never handled multi-byte locales, including utf8 ones.  But it 
appears that more and more locales come as utf8 variants, such as 
ZA.utf8. I did some research on the standards behind them, and it 
appears that because of various objections, including that they weren't 
general enough, nothing was ever fully approved.

It's a lot of work to handle multi-byte locales in general, but Perl 
already knows how to handle Unicode utf8.  This leads to my proposal: If 
under "use locale", a locale name ends in '.utf8', then Perl treats it 
for purposes of cytpe-only as regular Unicode.  We would not actually 
inspect the locale's rules, but use the Unicode ones, as if the locale 
were a properly specified and implemented Unicode locale.

For purposes of other things that Perl does with locale, such as the 
decimal point separator, Perl would use the locale rules, just as currently.

The advantages to a user are that Perl would start to accept this 
single, common, multi-byte locale and they would get the correct results.

Since we don't currently handle utf8 locales, I don't know any downsides 
for the user.

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About