> -----Original Message----- > From: karl williamson [mailto:public@khwilliamson.com] > Sent: Sunday, March 07, 2010 7:10 PM > To: Burak Gürsoy > Cc: perl5-porters@perl.org > Subject: Re: Is lc(\x{130}) -> i\x{307} a bug? > > Perl is working correctly according to the Unicode standard. Ok. > The inclusion of U+0307 is the correct Unicode mapping for languages > other than Turkish and Azerbaijani. The mapping should be just to 'i' It's Turkish in my case :) > for those two languages, but it does not preserve canonical equivalence > without further processing. Perl currently doesn't do this, nor does > Perl currently support locale handling of code points beyond 0xFF. > > Perl is unlikely to add such support, as Unicode itself has moved away > from defining locale dependent mappings. They still define this one > and > a few others that were included very early on in Unicode, but aren't > adding new ones. Instead they have a CLDR project for locale data. I > know next to nothing about that. I'll check that. And an OffTopic question if you don't mind: any ideas on the decision in perl6 on this matter? > These mappings of U+0130 have been very problematic and have caused > significant consternation over the years, but they (and we) are stuck > with it now. > > I thought there might be a CPAN module that changed the behavior to > suit > these two languages, but I just searched there and didn't see anything. Well... defining a ToLower() seems to be the remedy for this issue: sub ToLower { return <<"RANGE"; 0049\t\t0131 0130\t\t0069 RANGE } Too trivial to wrap inside a module I guess :) A module related to this must also handle the sorting, etc. However, since ToLower/ToUpper is by-passed for non-unicode-looking strings, the range trick will not work for things like uc('i')/lc('I') (for Turkish). Only way to make a reliable Turkish locale dependent thingy seems to be a combination of ToLower/ToUpper and pre-process the string before passing to uc/lc with s/// (or tr///). And this is a hack unfortunately. Thanks, BurakThread Previous | Thread Next