On Fri, Aug 20, 2021 at 1:06 PM demerphq <demerphq@gmail.com> wrote: > On Wed, 18 Aug 2021 at 19:17, Felipe Gasper <felipe@felipegasper.com> > wrote: > >> Per recent IRC discussion … >> >> PROBLEM: The naming of Perl’s “UTF-8 flag” is a continual source of >> confusion regarding the flag’s significance. Some think it indicates >> whether a given PV stores text versus binary. Some think it means that the >> PV is valid UTF-8. Still others likely hold other inaccurate views. >> >> The problem here is the naming. For example, consider `perl -e'my $foo = >> "é"'`. In this code $foo is a “UTF-8 string” by virtue of the fact that its >> code points (assuming use of a UTF-8 terminal) correspond to the bytes that >> encode “é” in UTF-8. > > > Nope. It might contain utf8, but it is not UTF8-ON. Think of it like a > square/rectangle relationship. All strings are "rectangles", all "squares" > are rectangles, some strings are squares, but unless SQUARE flag is ON perl > should assume it is a rectangle, not a square. The SQUARE flag should > only be set when the rectangle has been proved conclusively to be a square. > That the SQUARE flag is off does not mean the rectangle is not a square, > merely that the square has not been proved to be such. > > > The “UTF-8 flag”, however, is likely *not* set on this string. By >> contrast, consider `perl -Mutf8 -e'my $foo = "é"'`. Here $foo has the >> “UTF-8 flag” set, but $foo is NOT a “UTF-8 string” because its code points >> (in this case, only 1) aren’t valid UTF-8. >> > > Except it is valid UTF-8: (at least in my utf8 terminal). > > $ perl -MDevel::Peek -Mutf8 -e'my $foo = "é"; Dump($foo)' > SV = PV(0x153efc0) at 0x155fb38 > REFCNT = 1 > FLAGS = (POK,IsCOW,pPOK,UTF8) > PV = 0x1563240 "\303\251"\0 [UTF8 "\x{e9}"] > CUR = 2 > LEN = 10 > COW_REFCNT = 1 > > So the string is UTF-8. > The premise of this email seems to be about the internals of the string. That is not the contents of the string (which is "\x{e9}" in this example). Please re-evaluate in that context. -DanThread Previous | Thread Next