On Mon, May 19, 2008 at 04:50:42AM +0200, Aristotle Pagaltzis <pagaltzis@gmx.de> wrote: > > > encoding whereas they really are ANSI encoded. So once the > > > automatic upgrading assumes ANSI encoding instead of Latin-1, > > > everything should be working correctly, no? > > > > Uhm.... that one can even suggest such brokenness :) > > > > Of course basically everything will break, you mean, because > > the assumption that its not latin1 of course breaks roughly all > > code dealing with unicode in perl, which doesn't expect that > > perl suddenly uses ANSI instead of unicode codepoints (they > > differ!). > > Backtracking a bit here, why would this break anything? For > strings coming out of the Win32 API, immediately decode them to > characters; for strings going in, upgrade them to characters if > necessary, then encode them to ANSI at the last moment. great idea, the basic question is "what are characters"? Obviously, you cannot mean charatcers in the sense of "lettery/glyphs, character codepoints etc." because perl doesn't store this information (for example, when you load a jpg image into some scalar, you don't have a string composed of "characters", but only octets). If you mean "codepoints/numerical values" with "characters", then you lose the information about their encoding. However, *your* idea would mostly work *iff* you only ever used operating system interfaces when dealing with filenames. This is, however, not the case: consider prompting the user for a filename using a Gtk+ entry to acquire the filename, using a commandline argument as a filename. In all those cases, perl cannot know that those strings are filenames, and when asked to "open" them, might assume they are encoded in "characters" (whatever they are), when in fact, they are encoded in "utf-8", "koi8-r", "euc-jp" or so. Such a model is workable, but there would need to be a defined way to convert external filenames (e.g. on the comamndline) into something perl's open understands. > That way, no one ever needs to care that filenames are in ANSI, > because as far as Perl code is concerned it always gets them as > character strings. If that were possible, sure. _however_ note that jan didn't propose that, he said the "automatic upgrading" should change its interpretation from currently 0..255 become 0..255 to something else, where e.g. character values suddenly change codepoints (or, equally worse, change their interpretation). As the name "automatic" implies, perl does this kind of upgrading automatically, and to my knowledge it is not documented anywhere where this happens (nor are there any guarantees that it doesn't happen). This is because "automatic upgrading" is assumed to be something that doesn't change the string itself. Jan proposes to actualyl change the string itself (on the perl level) on those automatic upgrades, and this is what breaks perl, because suddenly all the internals are exposed to perl code and, worse, your string interpretation changes at undocumented points that you have to track yourself. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / pcg@goof.com -=====/_/_//_/\_,_/ /_/\_\