develooper Front page | perl.perl5.porters | Postings from May 2008

RE: on the almost impossibility to write correct XS modules

Thread Previous | Thread Next
Jan Dubois
May 17, 2008 10:50
RE: on the almost impossibility to write correct XS modules
Message ID:
On Sat, 17 May 2008, Marc Lehmann wrote:
> On Thu, May 15, 2008 at 04:31:13PM -0700, Jan Dubois <> wrote:
> > encoding whereas they really are ANSI encoded. So once the automatic
> > upgrading assumes ANSI encoding instead of Latin-1, everything should be
> > working correctly, no?
> Uhm.... that one can even suggest such brokenness :)

I see the smiley, but I'm not sure I understand the comment. Surely the
actual strings without SvUTF8 set are encoded in the system default ANSI
codepage: text returned by qx() will be ANSI encoded, filenames returned
by readdir() will be ANSI encoded and so on. This is just the nature of
the 8-bit OS API.

The brokenness right now is that when Perl automatically upgrades this
data to UTF8, it assumes that the data is Latin1 instead of ANSI,
potentially garbling the data if it contained codepoints where the
current ANSI codepage and Latin1 are different.

How would you want to "fix" this then? Translate all 8-bit data when it
is read from the OS from ANSI to Latin1? That seems a lot harder, and
will also be quite unintuitive.
> Of course basically everything will break, you mean, because the
> assumption that its not latin1 of course breaks roughly all code dealing
> with unicode in perl, which doesn't expect that perl suddenly uses ANSI
> instead of unicode codepoints (they differ!).

Only code making the explicit assumption that 8-bit strings are encoded
in Latin1 is going to break. All code relying on the implicit conversion
between 8-bit and UTF8 will actually be fixed and not broken by this
change. :)


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About