From the keyboard of shmem [01.08.21,10:01]: > From the keyboard of Felipe Gasper [31.07.21,20:53]: > > [..] >> Another way to look at it: the content of the parsed strings actually >> differs between the two: >> >> my $x = do { no utf8; "éé" }; >> my $y = do { use utf8; "éé" }; >> >> In the above, $x is a sequence of 4 code points (195, 169, 195, 169), >> whereas $y is a sequence of 2 code points (233, 233). That’s it; there is >> no other difference between $x and $y. Perl doesn’t know that $x is a “byte >> string” and $y is a “character string”; it just knows the code points. > > This actually depends on the utf8-awareness of the editor used to input > that program text. Entered on a terminal with LANG=en_GB.utf8 via vi, both > $x and $y are a sequence of 4 code points, the latter with the UTF8 flag > set which condenses two code points into chr(233). Why? See explanation > below, and please correct me if I am wrong. Correcting myself: $y *is* two code points, the internal representation is 4 bytes. Without the UTF8 flag the internal representation is idem with code points. > PV = 0x9085d0 "\303\251\303\251"\0 [UTF8 "\x{e9}\x{e9}"] code points ---------------------------------^^^^^^^^^^^^ Sorry for my confusion :-P 0--gg- -- _($_=" "x(1<<5)."?\n".q·/)Oo. G°\ / /\_¯/(q / ---------------------------- \__(m.====·.(_("always off the crowd"))."· ");sub _{s./.($e="'Itrs `mnsgdq Gdbj O`qkdq")=~y/"-y/#-z/;$e.e && print}Thread Previous | Thread Next