On Tue, Feb 20, 2001 at 09:53:09PM +0000, nick@ing-simmons.net wrote: > The big question mark is what we (well "they" actually) do on EBCDIC > platforms where it has been demonstrated that ord('A') == 0xC1 is > a requirement (if only because it is used as a test for "this is an EBCDIC > platform"). Simon and Peter have made much progress in this area > but they have not fully explained it yet. OK. Let me try and finally explain what I propose to do with EBCDIC. Perl, on most non-EBCDIC platforms, happily assumes that the world is Latin1. Or LatinX - it doesn't matter. It only becomes significant when Unicode strings are introduced into a Perl program. When LatinX and Unicode strings meet, Perl assumes that the non-Unicode string is Latin1 and upgrades it to Unicode. If it isn't Latin1, then we have another problem which can be solved another time, and probably with :encode(LatinX). However, if we don't introduce Unicode strings into the equation, then LatinX can continue being LatinX and people doesn't actually need to care that LatinX is not the first 255 characters of the Unicode standard, that is, Latin 1. I want to extend this idea to EBCDIC. If you throw around a bunch of EBCDIC strings, fine. You don't need to care about that, and Perl will continue to operate in the way that it always has done. If you introduce a Unicode string into the equation, then things get tricky. Just like with LatinX, Perl will upgrade that string to Unicode, passing it through a filter which turns EBCDIC code points into Unicode code points. Then you have a bunch of Unicode strings, and you're back to the model above. No problem. So you have: LatinX codepoints + LatinX -> LatinX LatinX codepoints + Unicode -> Upgrade LatinX (as Latin1) to Unicode EBCDIC codepoints + EBCDIC -> EBCDIC EBCDIC codepoints + Unicode -> Upgrade EBCDIC (via filter) to Unicode You can see the parallel? It's very easy. If the LatinX model works, then the EBCDIC model works. The only spanner in the, um, works is v-strings. The problem with v-strings is that they expect the Unicode code point x to be the same as chr(x), which isn't the case for EBCDIC, because the lower 255 codepoints are *not* the same as EBCDIC and they are for Latin 1. Hence v5.6.0 means something different on EBCDIC as it does on LatinX. This is basically what I'm trying to fix when I get my access to an EBCDIC machine - distinguishing between those functions which use Unicode for numbers and for strings. I think that's about it. -- A witty saying means nothing. -VoltaireThread Previous | Thread Next