develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

From:
Jarkko Hietaniemi
Date:
February 20, 2001 06:19
Subject:
Re: The State of The Unicode
Message ID:
20010220081857.B22349@chaos.wustl.edu
> > There were *internal* inconsistencies and I straightened them out.
> > I do agree that _for_the_normal_user_ whether chr(128) is
> > internally one or two bytes long shouldn't matter: and guess what,
> > currently chr(128) eq "\x80" && chr(128) eq "\x{80}" && chr(128)
> > eq v128 && chr(128) eq pack("C", 128).  So what's the problem we
> > are trying to solve here?
> 
> We are trying to answer, what gets printed when I output these guys
> to a UTF8 handle?  What are the results of isalpha, toupper, and the

To an UTF-8 handle?  Well, the character 128, of course... :-)
You meant "which bytes get output"?  0xc4 0x80.

> Unicode property regexp patterns?  In short, we are trying to
> identify which character the darn thing is.

128, and off-hand, none of the [[:foo:]] character classes match to
that -- if we are specifically talking about 128.

> Andrew

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About