2021-9-3 10:30 Dan Book <grinnz@gmail.com> wrote : > On Thu, Sep 2, 2021 at 9:03 PM Yuki Kimoto <kimoto.yuki@gmail.com> wrote: > >> I want to get the basic knowledge to join this discussion. >> >> Would you tell me the following things? >> >> 1. Do the following things mean the same or different? >> >> my $bytes = Encode::encode('UTF-8', $string); >> >> utf8::encode($string); >> my $bytes = $string; >> > > Similar, with some implementation differences: Encode::encode doesn't > modify $string in place (with those arguments), and utf8::encode does; > Encode::encode with UTF-8 will encode invalid codepoints (such as > surrogates, supercharacters) to replacement characters (with those > arguments) and utf8::encode will naively encode them with Perl's internal > encoding method like other codepoints (which can result in bytestrings > which UTF-8 decoders may consider invalid). > > >> 2. Do the following things mean the same or different? >> >> my $string = Encode::decode('UTF-8', $bytes); >> >> utf8::decode($bytes); >> my $string = $bytes; >> > > Similar as above, but additionally, if the bytes cannot be interpreted as > even Perl's lax internal encoding, utf8::decode will return false and leave > the string unchanged; whereas Encode::decode decodes malformed byte > sequences to replacement characters (with those arguments). Encode::decode > will also decode invalid codepoints to replacement characters, but > utf8::decode will naively accept them. > > >> 3. Do the following things mean the same or different? >> >> # Perl >> utf8::decode >> >> # XS >> sv_utf8_decode >> > > These are the same. > > 4. Do the following things mean the same or different? >> >> # Perl >> utf8::encode >> >> # XS >> sv_utf8_encode >> > > These are the same. > > Overall, all of these change the logical contents of the string from bytes > to the Unicode characters they represent, or from Unicode characters to > representative bytes. > > -Dan > Dan Thank you. I have some time to understand this.Thread Previous | Thread Next