On Sat, 17 May 2008, Marc Lehmann wrote: > On Thu, May 15, 2008 at 04:31:13PM -0700, Jan Dubois <jand@activestate.com> wrote: > > encoding whereas they really are ANSI encoded. So once the automatic > > upgrading assumes ANSI encoding instead of Latin-1, everything should be > > working correctly, no? > > Uhm.... that one can even suggest such brokenness :) I see the smiley, but I'm not sure I understand the comment. Surely the actual strings without SvUTF8 set are encoded in the system default ANSI codepage: text returned by qx() will be ANSI encoded, filenames returned by readdir() will be ANSI encoded and so on. This is just the nature of the 8-bit OS API. The brokenness right now is that when Perl automatically upgrades this data to UTF8, it assumes that the data is Latin1 instead of ANSI, potentially garbling the data if it contained codepoints where the current ANSI codepage and Latin1 are different. How would you want to "fix" this then? Translate all 8-bit data when it is read from the OS from ANSI to Latin1? That seems a lot harder, and will also be quite unintuitive. > Of course basically everything will break, you mean, because the > assumption that its not latin1 of course breaks roughly all code dealing > with unicode in perl, which doesn't expect that perl suddenly uses ANSI > instead of unicode codepoints (they differ!). Only code making the explicit assumption that 8-bit strings are encoded in Latin1 is going to break. All code relying on the implicit conversion between 8-bit and UTF8 will actually be fixed and not broken by this change. :) Cheers, -JanThread Previous | Thread Next