develooper Front page | perl.perl5.porters | Postings from February 2007

Re: Future Perl development

Thread Previous | Thread Next
From:
mark
Date:
February 6, 2007 10:41
Subject:
Re: Future Perl development
Message ID:
20070206184043.GA16994@mark.mielke.cc
On Tue, Feb 06, 2007 at 06:27:59PM +0100, Gerard Goossen wrote:
> > This is not a matter of context, by the way. Instead, the value "\xFF"
> > is polymorphic. It's both a unicode string representing code point
> > U+00FF, and the single byte 0xFF.
> No. \xFF creates a character represented by FF according to the native
> encoding.
> If your native encoding is EBCDIC this does NOT correspend to
> U+00FF (instead it corresponds to U+007E or U+009F, depending on the
> flavor of EBCDIC you're on).
> You also assume that \xFF in the native encoding corresponds to a byte
> You assume (like everybody else) that in the native encoding a
> character corresponds to a byte with the same numeric value.
> This assumption is what makes the transition to UTF-8 so difficult,
> because in the UTF-8 encoding, the assumption is NOT correct. 

I think are saying that UTF-EBCDIC should be the internal representation
for strings in Perl on EBCDIC platforms if any characters in the string
has a value >= 0x80.

If this is what you are saying, then I can see why I, and other people
cannot understand you. We're not on the same page. I don't believe
UTF-EBCDIC makes sense, as UTF-EBCDIC is not an encoding of UNICODE.
It is an encoding of a mix between EBCDIC/UNICODE. Although UTF-8
is only an encoding scheme, most people assume that the internal
representation for a language that claims to support UNICODE, should
be UNICODE, therefore the UTF-8 should be encoding UNICODE code
points. Not EBCDIC/UNICODE code points.

Perhaps this would represent a performance degradation for systems
that use EBCDIC natively? Is this why you would focus on UTF-EBCDIC?

Anyways - I've not shared people's opinions that Perl's implementation
of UNICODE or UTF-8 is excellent. I've avoided it wherever possible.
I prefer Java's approach or GTK's approach. Java uses UTF-16 internal
representation, but never confuses internal representation with
external representation. If portability is of course, this seems
an excellent approach.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada

  One ring to rule them all, one ring to find them, one ring to bring them all
                       and in the darkness bind them...

                           http://mark.mielke.cc/


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About