On Wed, Mar 31, 2021 at 9:06 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote: > I have a question about the following code to understand Perl strings. > > ----------------------------------------------------- > use strict; > use warnings; > use utf8; > use Encode 'encode', 'decode'; > use Devel::Peek; > > # ASCII range > my $text = 'abc'; > > # 0 > print "A. " . (utf8::is_utf8($text) ? 1 : 0) . "\n"; > Devel::Peek::Dump $text; > print "\n"; > > my $bytes = encode('UTF-8', $text); > > # 0 > print "B. " . (utf8::is_utf8($bytes) ? 1 : 0) . "\n"; > Devel::Peek::Dump $bytes; > print "\n"; > > my $text_again = decode('UTF-8', $bytes); > > # 1 > print "C. " . (utf8::is_utf8($text_again) ? 1 : 0) . "\n"; > Devel::Peek::Dump $text_again; > print "\n"; > ------------------------------------------------------ > > "use utf8" don't turn on utf8 flag of ascci string. > > On the other hand Encode::decode turn on utf8 flag of ascii string. > > Is this design mistake or have some intention? > This is intentional; the names of these two features are not related. "use utf8" means that the source code is assumed to be UTF-8, and thus implicitly decoded from it - this may or may not require upgraded string storage. The utf8 flag indicates which of the two types of string storage is being used for a string. This can be changed at any time by the perl interpreter and no guarantees are provided, other than the upgraded format (utf8 bit on) must be used for any string containing codepoints over 255, because the downgraded format physically can't store it. -DanThread Previous | Thread Next