develooper Front page | perl.perl5.porters | Postings from April 2021

Re: Perl 7: Fix string leaks?

Thread Previous | Thread Next
From:
Yuki Kimoto
Date:
April 1, 2021 02:56
Subject:
Re: Perl 7: Fix string leaks?
Message ID:
CAExogxO=e9b3k3_Ow9wT7N-j0+D1J8Qr=P+1nOvUzcHpUArj=Q@mail.gmail.com
Dan

> This is intentional; the names of these two features are not related.

> "use utf8" means that the source code is assumed to be UTF-8, and thus
implicitly decoded from it - this may or may not require upgraded string
storage.

-----------------------------------------------------
use strict;
use warnings;
use utf8;
use Encode 'encode', 'decode';
use Devel::Peek;

# ASCII range
my $text = 'abc';

# 0
print "A. " . (utf8::is_utf8($text) ? 1 : 0) . "\n";
Devel::Peek::Dump $text;
print "\n";
--------------------------------------------------------

If first example was 1, what problem occur?

In other words, 'abc' is interpreted as UTF-8 and utf8 flag turn on.


2021年4月1日(木) 10:10 Dan Book <grinnz@gmail.com>:

> On Wed, Mar 31, 2021 at 9:06 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote:
>
>> I have a question about the following code to understand Perl strings.
>>
>> -----------------------------------------------------
>> use strict;
>> use warnings;
>> use utf8;
>> use Encode 'encode', 'decode';
>> use Devel::Peek;
>>
>> # ASCII range
>> my $text = 'abc';
>>
>> # 0
>> print "A. " . (utf8::is_utf8($text) ? 1 : 0) . "\n";
>> Devel::Peek::Dump $text;
>> print "\n";
>>
>> my $bytes = encode('UTF-8', $text);
>>
>> # 0
>> print "B. " . (utf8::is_utf8($bytes) ? 1 : 0) . "\n";
>> Devel::Peek::Dump $bytes;
>> print "\n";
>>
>> my $text_again = decode('UTF-8', $bytes);
>>
>> # 1
>> print "C. " . (utf8::is_utf8($text_again) ? 1 : 0) . "\n";
>> Devel::Peek::Dump $text_again;
>> print "\n";
>> ------------------------------------------------------
>>
>>  "use utf8" don't turn on utf8 flag of ascci string.
>>
>> On the other hand Encode::decode turn on utf8 flag of ascii string.
>>
>> Is this design mistake or have some intention?
>>
>
> This is intentional; the names of these two features are not related.
>
> "use utf8" means that the source code is assumed to be UTF-8, and thus
> implicitly decoded from it - this may or may not require upgraded string
> storage.
>
> The utf8 flag indicates which of the two types of string storage is being
> used for a string. This can be changed at any time by the perl interpreter
> and no guarantees are provided, other than the upgraded format (utf8 bit
> on) must be used for any string containing codepoints over 255, because the
> downgraded format physically can't store it.
>
> -Dan
>

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About