develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

Jarkko Hietaniemi
February 19, 2001 15:19
Re: The State of The Unicode
Message ID:
On Mon, Feb 19, 2001 at 06:07:14PM -0500, Andrew Pimlott wrote:
> Thank you for your prompt reply--you did read the whole thing,
> right?  ;-)

Yes, though I didn't ponder every detail.

> On Mon, Feb 19, 2001 at 04:47:53PM -0600, Jarkko Hietaniemi wrote:
> > (1) The current model, both externally and internally,
> >     follows what is described by the Camel Mk3.
> Camel III has zero complete examples of Unicode support (unless
> there are examples outside of the Unicode section, which I have not
> read).  The Unicode chapter is a scant nine pages.  There is nothing
> there to violate.

There are rules like "old non-Unicode-aware programs doing byte
things shall not break".

> I agree that I have seen no examples as far as pure string
> manipulation.  But the relationship between strings and numbers must

Just manipulate them.  As people seem lately to be eager to chant:
"transparent" :-)

> > Combine (1) and (2) and I see it as "what is broken, so what's there to
> > fix" situation, let's call it (3).
> > 
> > As far "what is broken", I do understand the concern of "exposing too
> > much of the internal representation" (which at the moment happens to
> > be UTF-8) to the user, having bytes and character is confusing at
> > best.  However, I'm not fully convinced that completely hiding it is
> > wise, either.  If from Perl level one cannot reach back to the bytes
> > comprising the UTF-8 representation of the characters, I feel we are
> > trying to pad the cell too softly.
> My kingdom for one example.

You want to create a prototype of Unicode composing and decomposing
algorithm in Perl, or you want to write a SCSU (Unicode compression
algorithm) algorithm in Perl.  You want to convert UTF-8 into UTF-16.
Anywhere where you want to get into the guts of the encoding(s).

$jhi++; #
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About