On Wed, Feb 21, 2001 at 09:17:58AM +0000, Nick Ing-Simmons wrote: > >What is A? > > 'A' is whatever script reading process and toke.c think it is. So it is 0xC1... > #!perl > exit( ord('A') == 0xC1 ? 0 : 1 ) > __END__ > > must exit 0 on EBCDIC. Of course. > But on EBCDIC > > print FOO "\xC1"; > $a = <FOO>; > die unless lc($a) eq 'a'; > > mustn't die, etc. etc. ... but I see no "but", just "of course". Your 'a' is read from disk as 0xE1 (or whatever - do not ), the cultural info table of EBCDIC say that lc variant of 0xC1 should be 0xE1 etc etc etc. So it "just works", the same as things work in any locale. > It would have been possible to transform 0xC1 on disc to U+0041 as > seen by toke.c (e.g. with an implicit :encoding(cp1047) on DATA handle) > but then the above requirements (to make old scripts work) would > be very messy. So they don't do that, toke.c sees '\xC1', the internal > "byte" form has numbers 0 .. 255 having their EBCDIC "cultrural info" > and so on. Exactly. This is *why* I made my proposal: to support 'use locale', do not translate things to Unicode. "Translate" the cultural info table instead. > Our locale story is no where near as good as our Unicode story. > But that is mostly the fault of under-specified locale semantics > at system level. No, the faults are at different places: a) use locale is lexically scoped, so useless when modules are used; b) there were no defined semantic of the interaction of locale and Unicode [my proposal creates such a semantic]; > Switching on EBCDIC-ness is cleaner. There is no difference (as far as Perl is concerned; except for sorting) between EBCDIC-ness and locale. If you feel otherwise, please give an example to unconfuse me. > use utf8; > > still has semantic that it says the script itself is assumed to come > from a UTF-8 encoded source file. use utf8 is a mastodon. It is not needed for any other purpose, so let it be so. > big5 has other problems in that it is a multi-byte encoding Does not matter: I discuss character mapping here, not encoding. Ilya