On Tue, Feb 06, 2007 at 06:27:59PM +0100, Gerard Goossen wrote:
> > This is not a matter of context, by the way. Instead, the value "\xFF"
> > is polymorphic. It's both a unicode string representing code point
> > U+00FF, and the single byte 0xFF.
> No. \xFF creates a character represented by FF according to the native
> encoding.
> If your native encoding is EBCDIC this does NOT correspend to
> U+00FF (instead it corresponds to U+007E or U+009F, depending on the
> flavor of EBCDIC you're on).
> You also assume that \xFF in the native encoding corresponds to a byte
> You assume (like everybody else) that in the native encoding a
> character corresponds to a byte with the same numeric value.
> This assumption is what makes the transition to UTF-8 so difficult,
> because in the UTF-8 encoding, the assumption is NOT correct.
I think are saying that UTF-EBCDIC should be the internal representation
for strings in Perl on EBCDIC platforms if any characters in the string
has a value >= 0x80.
If this is what you are saying, then I can see why I, and other people
cannot understand you. We're not on the same page. I don't believe
UTF-EBCDIC makes sense, as UTF-EBCDIC is not an encoding of UNICODE.
It is an encoding of a mix between EBCDIC/UNICODE. Although UTF-8
is only an encoding scheme, most people assume that the internal
representation for a language that claims to support UNICODE, should
be UNICODE, therefore the UTF-8 should be encoding UNICODE code
points. Not EBCDIC/UNICODE code points.
Perhaps this would represent a performance degradation for systems
that use EBCDIC natively? Is this why you would focus on UTF-EBCDIC?
Anyways - I've not shared people's opinions that Perl's implementation
of UNICODE or UTF-8 is excellent. I've avoided it wherever possible.
I prefer Java's approach or GTK's approach. Java uses UTF-16 internal
representation, but never confuses internal representation with
external representation. If portability is of course, this seems
an excellent approach.
Cheers,
mark
--
mark@mielke.cc / markm@ncf.ca / markm@nortel.com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada
One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...
http://mark.mielke.cc/