develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on the almost impossibility to write correct XS modules

Marc Lehmann
May 19, 2008 13:25
Re: on the almost impossibility to write correct XS modules
Message ID:
On Sat, May 17, 2008 at 07:38:16PM +0100, Ben Morrow <> wrote:
> Perl explicitly documents that 8-bit data is treated as ISO8859-1,
> except on EBCDIC platforms. 

If it only were so (from perluniintro):

   Internally, Perl currently uses either whatever the native eight-bit
   character set of the platform (for example Latin-1) is, defaulting to
   UTF-8, to encode Unicode strings. Specifically, if all code points in the
   string are 0xFF or less, Perl uses the native eight-bit character set.
   Otherwise, it uses UTF-8.

Of course, this isn't even implementable (nor is it even remotely true),
but this is one of the many issues: whoeever wrote the manpage part either
was confused, or used extremely bad wording (some of it is simply wrong,
other things are maybe badly presented, and still others are illogical).

For example, the "attached to operations" can be easily misunderstood,
as the "bad model" attached wide-characterness to operations, while the
current model attached "unicodeness" to operations (or at least most of
perl uses that interpretation, e.g. open, concatenation etc.), so the wording
"attaching it to operations" is not very helpful.

the people who maintain perl need to agree on a unicode model at one
point, and then implement it with force. The current state of affairs is
extremely damaging, as its impossible to get bugfixes in because everybody
disagrees on wetehr it is a bug or not.

                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_    
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\ Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About