develooper Front page | perl.perl5.porters | Postings from February 2001

Re: The State of The Unicode

February 19, 2001 18:53
Re: The State of The Unicode
Message ID:
I think this message helps me understand some of the general

Let me say first that the reason all of my pseudo-code has been "OO
crap" is that I'm trying to make it as painfully clear as I can
think to (yes, emphasis on pain).  Since nobody else has yet
proposed any specific interfaces, I have no reference point.  I'll
try to do better :-)

On Mon, Feb 19, 2001 at 06:50:10PM -0700, Nathan Torkington wrote:
> Andrew Pimlott writes:
> > As I said to abigail, I would love a concrete explanation of what
> > you have in mind.  In particular, what is your mechanism for
> > ensuring that perl is representing $output as utf8?
> I'm not sure what your question means.  Here's are some situations in
> more detail.
> I'm writing a module that encodes things in base64.  I get a string to
> encode.  I need to process it byte-by-byte to produce a base64
> encoding.  How do I do that?

I'm saying you call an explicit function, eg to_utf8(), which gives
you back a string such that if you say "substr $str, 0, 1", you get
the first byte of the UTF-8 representation, and "length $str" is the
length of the UTF-8 representation.  Period.

> I'm writing a network server where part of a response is the number of
> octets to expect in the message body.  If the subroutine that sends
> the response gets a string encoded in UTF-8, how does it calculate
> the number of octets?
> > Let me show you what I would fancy (modulo syntax, which I haven't
> > been following):
> > 
> >     $eh = new EncodingHandler 'UTF-8';
> >     $out = new IO::Socket {
> >         output_discipline => $eh->output_discipline, ... };
> >     print $out "Content-length: " . $eh->length($output);
> >     print $out $output;
> What's an Encoding Handler?

The idea was, an EncodingHandler is something that wraps up all the
details of representing strings in a particular encoding.  It is
based on actual running Perl code that I can't show you :-(  The
benefit is that all encoding are handled the same and it uses nice
pretty arrows.

But *epiphany* I get the sense that people here don't think of UTF-8
as "another encoding", they think of it as "what just works" with a
Unicode-enabled perl, the way ASCII is "what just works" with
today's perl.  I'm meditating on this.

Andrew Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About