On Sat, May 17, 2008 at 07:38:16PM +0100, Ben Morrow <ben@morrow.me.uk> wrote: > Perl explicitly documents that 8-bit data is treated as ISO8859-1, > except on EBCDIC platforms. If it only were so (from perluniintro): Internally, Perl currently uses either whatever the native eight-bit character set of the platform (for example Latin-1) is, defaulting to UTF-8, to encode Unicode strings. Specifically, if all code points in the string are 0xFF or less, Perl uses the native eight-bit character set. Otherwise, it uses UTF-8. Of course, this isn't even implementable (nor is it even remotely true), but this is one of the many issues: whoeever wrote the manpage part either was confused, or used extremely bad wording (some of it is simply wrong, other things are maybe badly presented, and still others are illogical). For example, the "attached to operations" can be easily misunderstood, as the "bad model" attached wide-characterness to operations, while the current model attached "unicodeness" to operations (or at least most of perl uses that interpretation, e.g. open, concatenation etc.), so the wording "attaching it to operations" is not very helpful. the people who maintain perl need to agree on a unicode model at one point, and then implement it with force. The current state of affairs is extremely damaging, as its impossible to get bugfixes in because everybody disagrees on wetehr it is a bug or not. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / pcg@goof.com -=====/_/_//_/\_,_/ /_/\_\