Ben Carter skribis 2007-03-31 4:08 (-0600): > Unicode does not even HAVE characters, it has codepoints. Very good point, but Perl's documentation refers to codepoints as "characters", and does that rather consistently. I'm considering sweeping through the docs and changing it all, but it would be a lot of work and a huge patch. I wonder if it's worth that. > Now consider the case of > $y = chr(1000); > Clearly whatever is in $y cannot be a single octet. The way Perl > currently works is that now $y is considered to be a string of Unicode > codepoints. Yes. But to go into a bit more detail for the more interesting case of chr(233): this is either a byte string with only one byte, or a text string with only one cha^Wcodepoint. Perl doesn't know, or care, so the programmer has to. > So $y contains a single codepoint, U+03E8. The internal flag is used > to indicate that the internal data pointer points to something that is > a "Unicode codepoint string". No, see Abigail's response for clarification. > print unpack("H*", pack("C", 1000)); Feeding 1000 to C has undefined behaviour: the C type can only handle values 0..255, and there's no documentation defining what happens if you feed it something <0 or >255. A similar thing occurs with floating point numbers, like 64.5. The current implementation truncates that to 64, without warning. > If you expect values over 255, then you should not use "C". Indeed! > Of course if you have values over 255 you have to use "U" in unpack, > that only makes sense! If these values are codepoints, yes. But if they're just numbers, other unpack templates, like perhaps N or V are better. > [1] I am deliberately ignoring the box in the corner labeled "EBCDIC". Oh, so am I. In fact, I've probably never even seen such a box in my short life so far. -- korajn salutojn, juerd waalboer: perl hacker <juerd@juerd.nl> <http://juerd.nl/sig> convolution: ict solutions and consultancy <sales@convolution.nl> Ik vertrouw stemcomputers niet. Zie <http://www.wijvertrouwenstemcomputersniet.nl/>.Thread Previous | Thread Next