develooper Front page | perl.perl5.porters | Postings from March 2007

Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)

Thread Previous | Thread Next
From:
Abigail
Date:
March 31, 2007 03:30
Subject:
Re: the utf8 flag (was Re: [perl #41527] decode_utf8 sets utf8 flag on plain ascii strings)
Message ID:
20070331102955.GA19690@abigail.nl
On Sat, Mar 31, 2007 at 04:08:30AM -0600, Ben Carter wrote:
> 
> Now consider the case of
> 
>   $y = chr(1000);
> 
> Clearly whatever is in $y cannot be a single octet.  The way Perl
> currently works (and this is my limited understanding here - someone
> with more knowledge can feel free to step in and correct my errors)
> is that now $y is considered to be a string of Unicode codepoints.  So
> $y contains a single codepoint, U+03E8.  The internal flag is used to
> indicate that the internal data pointer points to something that is a
> "Unicode codepoint string".

No.

"ABCD" also contains 4 Unicode code points.

Perl strings only contain Unicode code points. Always.

The issue is not whether or not a string is a "Unicode" string or not, the
point is the *encoding* of the Unicode code points. That can be in UTF-8
(variable number of bytes/code point), or Latin-1 (one byte/character).

Unicode does not imply UTF-8.



Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About