develooper Front page | perl.perl5.porters | Postings from May 2008

Re: on the almost impossibility to write correct XS modules

Thread Previous | Thread Next
Marc Lehmann
May 19, 2008 14:22
Re: on the almost impossibility to write correct XS modules
Message ID:
On Mon, May 19, 2008 at 01:34:13PM -0700, Glenn Linderman <> wrote:
> The gist of the problem here is that
> 1) The "automatic" conversion of 8-bit to UTF-8 "assumed" Latin1 because 
> it was (a) easy numerically (b) worked well on platforms that use Latin1 
> as their native encoding.

Which platform is that? I really don't know *any* such platform.

Note also that the automatic conversion in perl doesn't assume any
encoding *at all*, so this is simply not true.

> 2) Windows assumes ANSI code page for 8-bit data, but Perl on Windows, 
> for quite a few releases now, has not... instead, it "assumes" Latin1 
> when "automatically" converting 8-bit to UTF-8.

This is not what happens. Perl simply does not assume any encoding. If
you have an 8-bit filename encoded in latin1 then perl doesn't treat it
any different than an 8-bit filename encoded in koi8-r (another "ANSI"

upgrading and downgrading doesn't change that, or at least shouldn't
change that. where it does, it affects unix as much as any other platform.

> Retrofitting Perl on Windows to assume 8-bit data is ANSI will break all 
> code that attempts to work with the constraints of 1 and 2.

This would probably be true if 1) and 2) were real, but they are not.

> somewhat lower performance than assuming Latin1.  And it would possibly 
> have prevented, by example of a widely-used platform, the assumption 
> throughout lots of Perl code, that all 8-bit data is assumed to be 
> Latin1 implicitly.

Perl doesn't do that anywhere on any platform, to my knowledge. Make an
example of a platform that expects filenames as latin1.

(you can select this under unix, yes, but you can do so under windows as

(the rest of the mail is either true, or depends on these critical but
wrong assumptions. It is still use that decodes encoding).

                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_    
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About