develooper Front page | perl.perl5.porters | Postings from April 2021

Re: Perl 7: Fix string leaks?

Thread Previous | Thread Next
From:
Dan Book
Date:
April 1, 2021 01:10
Subject:
Re: Perl 7: Fix string leaks?
Message ID:
CABMkAVVVxD3HLojBarUv5LcVBksrLwuNTHJMXkAmYiPLhcEBeQ@mail.gmail.com
On Wed, Mar 31, 2021 at 9:06 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote:

> I have a question about the following code to understand Perl strings.
>
> -----------------------------------------------------
> use strict;
> use warnings;
> use utf8;
> use Encode 'encode', 'decode';
> use Devel::Peek;
>
> # ASCII range
> my $text = 'abc';
>
> # 0
> print "A. " . (utf8::is_utf8($text) ? 1 : 0) . "\n";
> Devel::Peek::Dump $text;
> print "\n";
>
> my $bytes = encode('UTF-8', $text);
>
> # 0
> print "B. " . (utf8::is_utf8($bytes) ? 1 : 0) . "\n";
> Devel::Peek::Dump $bytes;
> print "\n";
>
> my $text_again = decode('UTF-8', $bytes);
>
> # 1
> print "C. " . (utf8::is_utf8($text_again) ? 1 : 0) . "\n";
> Devel::Peek::Dump $text_again;
> print "\n";
> ------------------------------------------------------
>
>  "use utf8" don't turn on utf8 flag of ascci string.
>
> On the other hand Encode::decode turn on utf8 flag of ascii string.
>
> Is this design mistake or have some intention?
>

This is intentional; the names of these two features are not related.

"use utf8" means that the source code is assumed to be UTF-8, and thus
implicitly decoded from it - this may or may not require upgraded string
storage.

The utf8 flag indicates which of the two types of string storage is being
used for a string. This can be changed at any time by the perl interpreter
and no guarantees are provided, other than the upgraded format (utf8 bit
on) must be used for any string containing codepoints over 255, because the
downgraded format physically can't store it.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About