develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
Juerd Waalboer
March 30, 2007 17:13
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
Marc Lehmann skribis 2007-03-31  1:53 (+0200):
> So you force people to know about the internal flag, lest they cannot avoid
> the die.

No, you don't have to know about the UTF8 flag, just that Perl can't
always know if your string is a text string, but is there to help you
when it does.

> > Besides that, the "C" in Perl's pack() is documented as a single byte.
> "A C "char" is a byte".
> Your words.
> But here you say a byte is not a character. Thats a contradiction.

"C char" ne "Perl character".

> No, I asked for UTF-8 encoded characters. Again, read the documentation:
>           *       If the pattern begins with a "U", the resulting string will
>           *       be treated as UTF-8-encoded Unicode.

Resulting string, not input string.

The word "internally" is missing here. I will do my best to correct

> thats for pack, unfortunately.
>           U   A Unicode character number.  Encodes to UTF-8
>           internally
> uh, that internal thing again. So how many characters will pack "U", 200
> give me? According to the documentation, 2, as UTF-8 requires that. 

One character. Note again that "character" isn't the same as a "C char".
We in Perl land, and the people over in Unicode land, use different
words, sometimes.

Most of the time, a Perl "character" means codepoint.

> > > Right, while the documentation on unpack "U" disagrees with it, as it talks
> > > about UTF-8.
> > That would be a bug, but I can't find it in my copy (5.8.8). It only
> > says "Encodes to UTF-8 internally" for pack(), which as far as I can
> > tell, is true.
> So it talks about using UTF-8, so, according to you, it is a bug. Fine
> with me.

This was for pack, you were talking about unpack. Also, the word
"internally" was probably not added without reason.
korajn salutojn,

  juerd waalboer:  perl hacker  <>  <>
  convolution:     ict solutions and consultancy <>

Ik vertrouw stemcomputers niet.
Zie <>.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About