develooper Front page | perl.perl5.porters | Postings from November 2010

Re: Status of GB18030 support?

Thread Previous
From:
Nicholas Clark
Date:
November 1, 2010 04:03
Subject:
Re: Status of GB18030 support?
Message ID:
20101101110326.GS24189@plum.flirble.org
On Mon, Nov 01, 2010 at 01:16:37AM +0100, Josh Hurst wrote:
> Is there anywhere a document which describes how well perl5 works in a
> GB18030 locale on Linux and Solaris? I need to know for example if the
> Unicode properties in perl5 regex work in the GB18030 locale and if
> there are bugs which can cause Chinese characters to become corrupted.

The only reference to the string GB18030 anywhere in the perl distribution is
in cpan/Encode/lib/Encode/Supported.pod, as a pod link:

L<ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/pdf/GB18030_Summary.pdf>

That link still works, but the PDF is dated 2001.


The core documentation says:

    Use of locales with Unicode data may lead to odd results.  Currently,
    Perl attempts to attach 8-bit locale info to characters in the range
    0..255, but this technique is demonstrably incorrect for locales that
    use characters above that range when mapped into Unicode.  Perl's
    Unicode support will also tend to run slower.  Use of locales with
    Unicode is discouraged.

http://perldoc.perl.org/perlunicode.html#Interaction-with-Locales


If you want to match your data using Unicode properties, use Encode to
convert it on the way in to Unicode (UTF-8 internally), and on the way out
back to GB18030.

I'm afraid that if you need both Unicode semantics, and specific things to be
tweaked further by the locale setting, you're out of luck.

(Hopefully people will correct me if I got anything wrong)

Nicholas Clark

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About