develooper Front page | perl.perl5.porters | Postings from September 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
From:
Yuki Kimoto
Date:
September 3, 2021 03:04
Subject:
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
CAExogxPQVF2K5oQH_7MU6cqEMKqW81=GQbwi49yvBBzdvE_-sA@mail.gmail.com
2021-9-3 10:30 Dan Book <grinnz@gmail.com> wrote :

> On Thu, Sep 2, 2021 at 9:03 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote:
>
>> I want to get the basic knowledge to join this discussion.
>>
>> Would you tell me the following things?
>>
>> 1. Do the following things mean the same or different?
>>
>>   my $bytes = Encode::encode('UTF-8', $string);
>>
>>   utf8::encode($string);
>>   my $bytes = $string;
>>
>
> Similar, with some implementation differences: Encode::encode doesn't
> modify $string in place (with those arguments), and utf8::encode does;
> Encode::encode with UTF-8 will encode invalid codepoints (such as
> surrogates, supercharacters) to replacement characters (with those
> arguments) and utf8::encode will naively encode them with Perl's internal
> encoding method like other codepoints (which can result in bytestrings
> which UTF-8 decoders may consider invalid).
>
>
>> 2. Do the following things mean the same or different?
>>
>>   my $string = Encode::decode('UTF-8', $bytes);
>>
>>   utf8::decode($bytes);
>>   my $string = $bytes;
>>
>
> Similar as above, but additionally, if the bytes cannot be interpreted as
> even Perl's lax internal encoding, utf8::decode will return false and leave
> the string unchanged; whereas Encode::decode decodes malformed byte
> sequences to replacement characters (with those arguments). Encode::decode
> will also decode invalid codepoints to replacement characters, but
> utf8::decode will naively accept them.
>
>
>> 3. Do the following things mean the same or different?
>>
>>   # Perl
>>   utf8::decode
>>
>>   # XS
>>   sv_utf8_decode
>>
>
> These are the same.
>
> 4. Do the following things mean the same or different?
>>
>>   # Perl
>>   utf8::encode
>>
>>   # XS
>>   sv_utf8_encode
>>
>
> These are the same.
>
> Overall, all of these change the logical contents of the string from bytes
> to the Unicode characters they represent, or from Unicode characters to
> representative bytes.
>
> -Dan
>

Dan

Thank you.

I have some time to understand this.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About