On Mon, Nov 01, 2010 at 01:16:37AM +0100, Josh Hurst wrote: > Is there anywhere a document which describes how well perl5 works in a > GB18030 locale on Linux and Solaris? I need to know for example if the > Unicode properties in perl5 regex work in the GB18030 locale and if > there are bugs which can cause Chinese characters to become corrupted. The only reference to the string GB18030 anywhere in the perl distribution is in cpan/Encode/lib/Encode/Supported.pod, as a pod link: L<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf> That link still works, but the PDF is dated 2001. The core documentation says: Use of locales with Unicode data may lead to odd results. Currently, Perl attempts to attach 8-bit locale info to characters in the range 0..255, but this technique is demonstrably incorrect for locales that use characters above that range when mapped into Unicode. Perl's Unicode support will also tend to run slower. Use of locales with Unicode is discouraged. http://perldoc.perl.org/perlunicode.html#Interaction-with-Locales If you want to match your data using Unicode properties, use Encode to convert it on the way in to Unicode (UTF-8 internally), and on the way out back to GB18030. I'm afraid that if you need both Unicode semantics, and specific things to be tweaked further by the locale setting, you're out of luck. (Hopefully people will correct me if I got anything wrong) Nicholas ClarkThread Previous