develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

February 20, 2001 09:34
Re: The State of The Unicode
Message ID:
On Tue, Feb 20, 2001 at 08:18:57AM -0600, Jarkko Hietaniemi wrote:
> > > There were *internal* inconsistencies and I straightened them out.
> > > I do agree that _for_the_normal_user_ whether chr(128) is
> > > internally one or two bytes long shouldn't matter: and guess what,
> > > currently chr(128) eq "\x80" && chr(128) eq "\x{80}" && chr(128)
> > > eq v128 && chr(128) eq pack("C", 128).  So what's the problem we
> > > are trying to solve here?
> > 
> > We are trying to answer, what gets printed when I output these guys
> > to a UTF8 handle?  What are the results of isalpha, toupper, and the
> To an UTF-8 handle?  Well, the character 128, of course... :-)

But 128 is a number, not a character.  It corresponds to different
characters in different character sets.  Perhaps you had in your
mind "Unicode character 128"; if so, please phrase it this way or I
will continue to get confused and frustrated.

Do you see why such statements lead me to fear that Perl has no
string model?

> You meant "which bytes get output"?  0xc4 0x80.

Ok, you clearly mean _Unicode_ chararacter 128.  So, do you agree
that even if I am in a locale where local character 128 is not the
same as Unicode character 128, I get Unicode character 128 for all
examples?  And it stays Unicode character 128, through any internal
upgrades and downgrades?

If so, great.  But I thought the issue was that this is incompatible
with traditional behavior in non-ISO-8859-1 locales.  (And that
avoiding such incompatibilites was the goal of your message to which
I originally responded.)


PS.  Anyone care to offer me a cheat-sheet for getting started in a
non-ISO-8859-1 locale in Debian GNU/Linux "testing"? Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About