On Sat, Mar 31, 2007 at 04:08:30AM -0600, Ben Carter wrote: > > Now consider the case of > > $y = chr(1000); > > Clearly whatever is in $y cannot be a single octet. The way Perl > currently works (and this is my limited understanding here - someone > with more knowledge can feel free to step in and correct my errors) > is that now $y is considered to be a string of Unicode codepoints. So > $y contains a single codepoint, U+03E8. The internal flag is used to > indicate that the internal data pointer points to something that is a > "Unicode codepoint string". No. "ABCD" also contains 4 Unicode code points. Perl strings only contain Unicode code points. Always. The issue is not whether or not a string is a "Unicode" string or not, the point is the *encoding* of the Unicode code points. That can be in UTF-8 (variable number of bytes/code point), or Latin-1 (one byte/character). Unicode does not imply UTF-8. AbigailThread Previous | Thread Next