develooper Front page | perl.perl5.porters | Postings from February 2001

Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2)

Thread Previous | Thread Next
From:
Jarkko Hietaniemi
Date:
February 17, 2001 10:20
Subject:
Re: IV preservation (was Re: [PATCH 5.7.0] compiling on OS/2)
Message ID:
20010217122022.G28354@chaos.wustl.edu
> A. Camel-III's mention of 'use bytes' - which exposes the internal 
>    representation, which has lead to an expectation that representation
>    be predictable. That is only bad when it is expensive.

If we have a strong case we can convince Larry that use bytes is bad.

If the need to produce explicitly UTF-8 in compile time is found to be
false, we can do away with qu.

> B. EBCDIC. EBCDIC machines legacy use of chr(), ord() etc. violate the 
>    sequences of UNICODE codepoint premise.
>    So applying Nick/Ilya model strictly will break legacy EBCDIC code.
>    So we have a Simon et. al. EBCDIC model where two representations
>    are instead ibm-1047 code page, or UTF-8 encoded Unicode.
>    Semantics of chr/ord are unclear. My guess is that chr of 0..255 produce
>    characters according to IBM-1047, characters above that are Unicode.

Unless I'm mistaken that is what happens now (just like what happened
in pre-Unicode).  It's not only chr/ord, we have also to decide what
happens with \xHH, \x{HH}, and \Oooo.  Ditto for vNNN (though that
isn't so important for EBCDIC folks since the vNNN was introduced in
5.6, and 5.6 was broken for them in so many ways the the vNNN is
irrelevant for them).  And how about pack/unpack("U"/"C", ...)?

But these all are details from the greater picture.

Maybe I'm slow but I need to see the whole picture before I can
start doing it.  Listing nits here and nits there just makes me
trash and do nothing.

If we choose to make chr(65) to produce 'A' and ord('A') to produce 65
in all platforms, including EBCDIC, one possible kludge to keep the
EBCDIC legacy apps happy would be to have 'use ebcdic' which would
effectively make chr/ord/\x/etc to bypass the mapping to Unicode and
back and use instead the raw EBCDIC bytes.

>    This should be "safe" iff IBM-1047 is one-to-one bi-directional mapping  
>    to iso8859-1 (i.e. LS 256 Unicode code points).
>
> My personal axe to grid is that tk8.1+ (the unicode aware one) want
> and expects UTF-8. So continually normalizing 128..255 back to bytes
> is a pain in the neck.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About