develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
Dan Book
August 20, 2021 17:17
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
On Fri, Aug 20, 2021 at 1:06 PM demerphq <> wrote:

> On Wed, 18 Aug 2021 at 19:17, Felipe Gasper <>
> wrote:
>> Per recent IRC discussion …
>> PROBLEM: The naming of Perl’s “UTF-8 flag” is a continual source of
>> confusion regarding the flag’s significance. Some think it indicates
>> whether a given PV stores text versus binary. Some think it means that the
>> PV is valid UTF-8. Still others likely hold other inaccurate views.
>> The problem here is the naming. For example, consider `perl -e'my $foo =
>> "é"'`. In this code $foo is a “UTF-8 string” by virtue of the fact that its
>> code points (assuming use of a UTF-8 terminal) correspond to the bytes that
>> encode “é” in UTF-8.
> Nope. It might contain utf8, but it is not UTF8-ON. Think of it like a
> square/rectangle relationship. All strings are "rectangles", all "squares"
> are rectangles, some strings are squares, but unless SQUARE flag is ON perl
> should assume it is a rectangle, not a square. The SQUARE flag should
> only be set when the rectangle has been proved conclusively to be a square.
> That the SQUARE flag is off does not mean the rectangle is not a square,
> merely that the square has not been proved to be such.
> The “UTF-8 flag”, however, is likely *not* set on this string. By
>> contrast, consider `perl -Mutf8 -e'my $foo = "é"'`. Here $foo has the
>> “UTF-8 flag” set, but $foo is NOT a “UTF-8 string” because its code points
>> (in this case, only 1) aren’t valid UTF-8.
> Except it is valid UTF-8: (at least in my utf8 terminal).
> $ perl -MDevel::Peek -Mutf8 -e'my $foo = "é"; Dump($foo)'
> SV = PV(0x153efc0) at 0x155fb38
>   REFCNT = 1
>   PV = 0x1563240 "\303\251"\0 [UTF8 "\x{e9}"]
>   CUR = 2
>   LEN = 10
>   COW_REFCNT = 1
> So the string is UTF-8.

The premise of this email seems to be about the internals of the string.
That is not the contents of the string (which is "\x{e9}" in this example).
Please re-evaluate in that context.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About