develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
Juerd Waalboer
March 31, 2007 10:34
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
Tels skribis 2007-03-31 18:38 (+0000):
> * might not be the one who "decoded" $string or produced it even.
> * do not know if I am passed a "text" string as there is only the 
> flag-you-should-not-know-about to distinguish these two.
> (...)
> Ok, and how am I supposed know that in:
> 	sub dosomething {my $a = shift; }
> $a is a text string or a binary string? :)

No, not even the flag-you-should-not-know-about doesn't distinguish
between the two.

When you're writing a library function to handle arbitrary data, you'll
have to pick sides, either text or binary. Fortunately, the choice is
often very simple.

When you can't choose between these two, you could write two functions:
one for text data, one for binary data. Often you can write the text
function simply by using the binary thing underneath, with a specified
UTF encoding.

If you're just serializing data, you could opt for storing the literal
internal buffer along with the state of the UTF8 flag, or (exactly like
the previous paragraph) pick any specific encoding and stick to that.

If you happen to have a function in a current API (i.e. not a contrived
one) for which you find it hard to decide, please let me know the
details. I'll help you offlist.

> Only if you consider your own code. But data is sometimes processed by other 
> code (Perl itself, some module etc.). 

Yes, indeed. This can be troublesome. Especially many, many modules
still don't correctly support Unicode. I'm slowly but surely compiling a
list at Wanna help?
korajn salutojn,

  juerd waalboer:  perl hacker  <>  <>
  convolution:     ict solutions and consultancy <>

Ik vertrouw stemcomputers niet.
Zie <>.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About