develooper Front page | perl.perl6.internals | Postings from June 2001

Re: Should we care much about this Unicode-ish criticism?

Thread Previous | Thread Next
From:
Larry Wall
Date:
June 5, 2001 16:47
Subject:
Re: Should we care much about this Unicode-ish criticism?
Message ID:
200106052344.QAA05495@kiev.wall.org
Dan Sugalski writes:
: Have they changed that again? Last I checked, UTF-8 was capped at 4 bytes, 
: but that's in the Unicode 3.0 standard.

Doesn't really matter where they install the artificial cap, because
for philosophical reasons Perl is gonna support larger values anyway.
It's just that 4 bytes of UTF-8 happens to be large enough to represent
anything UTF-16 can represent with surrogates.  So they refuse to
believe in anything longer than 4 bytes, even though the representation
can be extended much further.  (Perl 5 extends it all the way to 64-bit
values, represented in 13 bytes!)

They also arbitrarily define UTF-32 to not use higher values than
0x10ffff, but that doesn't mean we're gonna send in the high-bit Nazis
if people want higher values for their own purposes.

But since the names UTF-8 and UTF-32 are becoming associated with those
arbitrary restrictions, it's getting even more important to refer to
Perl's looser style as utf8 (and, potentially, utf32).  I don't know
if Perl will have a utf16 that is distinguised from UTF-16.

Larry

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About