develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Juerd Waalboer
Date:
March 30, 2007 13:10
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
20070330200929.GV31277@c4.convolution.nl
Marc Lehmann skribis 2007-03-30 14:24 (+0200):
> In fact, I teach a lot of people about unicode in perl.

At the German Perl Workshop, I saw your unicode presentation. I don't
know if this is a good representation for your teaching of unicode, but
I noticed that you used utf8::encode and utf8::decode, not the similar
functions from Encode.pm that are more commonly used and advised. These
utf8:: in-place encode/decode functions are efficient, but using them
means that the same SV changes from byte string to text sting or vice
versa, which makes the code hard to follow, and any attempt to use
hungarian notation in code examples impossible.

Whenever I teach the Perl Unicode model, I try to call my strings
$byte_string and $text_string, or similar. But
utf8::decode($byte_string) makes $byte_string a text string, and
utf8::encode($text_string) makes $text_string a byte string, so after
these statements, the names are no longer correct.

(And of course, I try not to teach people the Unicode model, because
that's something that's quite internal. I try to teach the difference
between text strings and byte strings, and how to use encodings (which
are byte representations of text strings). I treat UTF-8 exactly the
same way as KOI8-R. That helps a lot!)

> If perl had the abstract model juerd dreams of

and uses in day-to-day coding, without encountering ANY of the problems
that you describe (only the regex engine still manages to surprise me,
but that's because I'm too stubborn to utf8::upgrade explicitly).

It kind of makes one wonder if this dream might be reality (and your
reality a dream?)

> then perl would have a very easy unicode model that boils down to
> what I talked about on the perl workshop: encode/decode when doing
> I/O, oherwise, enjoy.

And keep text strings and byte strings separate!!!!!!!!!!!!!eleven

Whenever you must mix text strings and byte strings, consider the byte
strings I/O and encode/decode accordingly.

So, recap: encode/decode when doing I/O, keep text strings and byte
strings separate, otherwise, enjoy.
-- 
korajn salutojn,

  juerd waalboer:  perl hacker  <juerd@juerd.nl>  <http://juerd.nl/sig>
  convolution:     ict solutions and consultancy <sales@convolution.nl>

Ik vertrouw stemcomputers niet.
Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About