develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

February 19, 2001 16:12
Re: The State of The Unicode
Message ID:
On Mon, Feb 19, 2001 at 06:07:14PM -0500, Andrew Pimlott wrote:
> On Mon, Feb 19, 2001 at 04:47:53PM -0600, Jarkko Hietaniemi wrote:
> > 
> > As far "what is broken", I do understand the concern of "exposing too
> > much of the internal representation" (which at the moment happens to
> > be UTF-8) to the user, having bytes and character is confusing at
> > best.  However, I'm not fully convinced that completely hiding it is
> > wise, either.  If from Perl level one cannot reach back to the bytes
> > comprising the UTF-8 representation of the characters, I feel we are
> > trying to pad the cell too softly.
> My kingdom for one example.

If you step out of the box, it's easy to come up with examples.

When ever you need to interface with something that has no understanding
of Unicode, for which everything is data, you want to be able to look
bytewise to your strings. When talking to a serial device for instance,
or a hard disk, whose capacity will be measured in bytes, not variable
width characters. Device drivers might not be commonly written in Perl,
but that doesn't mean it should be impossible.

But you don't have to go that low level. uuencode & base64 work with 8-bit
bytes. Taking your Unicode string, looking at it as bytes, uuencode it,
send it, receive it, uudecode it and looking at it again as Unicode will
work - as long as you can get to the bytes representation.

A lot of existing compression and encryption software just look at the
data to be compressed or encrypted as bit or byte streams. There is no
reason to create Unicode aware versions of those tools before they can
be used on Unicode data. But to create Perl programs that compresses or
encrypts data that can be decompressed or decrypted with the existing
tools, your Perl program needs to be able to look at the data as a
sequence of bytes.

When in Rome....

Abigail Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About