On Wed, Feb 21, 2001 at 12:45:07AM +0000, Simon Cozens wrote: > Perl, on most non-EBCDIC platforms, happily assumes that the world is Latin1. Unless 'use locale'. > I want to extend this idea to EBCDIC. If you throw around a bunch of EBCDIC > strings, fine. You don't need to care about that, and Perl will continue to > operate in the way that it always has done. If you introduce a Unicode string > into the equation, then things get tricky. Just like with LatinX, Perl will > upgrade that string to Unicode, passing it through a filter which turns EBCDIC > code points into Unicode code points. Again, this is a particular case of 'use locale' situation. > You can see the parallel? It's very easy. If the LatinX model works, then the > EBCDIC model works. Can you please remove LatinX from your description. It confuses me... Do you mean "any locale"? > The only spanner in the, um, works is v-strings. v-thingies are one large problem anyway. I do not have a slightest idea *why* such an abomination made it into Perl... > The problem with v-strings is > that they expect the Unicode code point x to be the same as chr(x), which > isn't the case for EBCDIC, because the lower 255 codepoints are *not* the same > as EBCDIC and they are for Latin 1. Nope. It is not that you break v-thingies. You broke the fundamental relationship that ord() is transparent w.r.t. byte/utf8 transmogrifations. This is a no-no-no. The solution is as I proposed. I repeat it: 'use locale' (or working on a EBCDIC machine) switches the table of cultural info associated to integers in the range 0..255. That's all. [Well, if you use big-5 locale, then you need to switch things in the larger region...] The only problem with this is how to reuse existing (??? do they exist already?) i/o filters which assume translation-to-Unicode. Two things are needed: a) knowledge how to translate locale->Unicode (so recognition of which Unicode points move into 0..255 rage); b) a way to reach Unicode points which were in 0..255, but are no more; (a) is needed anyway for non-use-locale i/o filters, and to solve (b) I propose to "duplicate" the whole Unicode set outside of UTF-8 range (but inside utf8 range), say, starting at 80000000. Ilya