develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Thread Previous | Thread Next
From:
andrew
Date:
February 19, 2001 15:37
Subject:
Re: The State of The Unicode
Message ID:
20010219183638.I17705@pimlott.ne.mediaone.net
On Mon, Feb 19, 2001 at 05:19:30PM -0600, Jarkko Hietaniemi wrote:
> On Mon, Feb 19, 2001 at 06:07:14PM -0500, Andrew Pimlott wrote:
> > Camel III has zero complete examples of Unicode support (unless
> > there are examples outside of the Unicode section, which I have not
> > read).  The Unicode chapter is a scant nine pages.  There is nothing
> > there to violate.
> 
> There are rules like "old non-Unicode-aware programs doing byte
> things shall not break".

Granted, but that's "ground", not "figure".

> > My kingdom for one example.
> 
> You want to create a prototype of Unicode composing and decomposing
> algorithm in Perl,

Composing and decomposing are purely character operations.

> or you want to write a SCSU (Unicode compression
> algorithm) algorithm in Perl.

The spec for this makes no mention of UTF-8.  All you need are
Unicode characters.

> You want to convert UTF-8 into UTF-16.

If you just want to take a Perl string and get a new string in which
each "character" is just a number representing a byte of the UTF-16
representation,

    foreach (split /.*/, $str) {
        if ($_ < 0x10000) {
            $out .= ($_ >> 8) . ($_ & 0xff);
        } elsif ...
    }

What requires reaching into the guts of the representation?

Andrew

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About