develooper Front page | perl.perl5.porters | Postings from January 2020

Re: ???strict??? strings?

Thread Previous | Thread Next
From:
Dan Book
Date:
January 7, 2020 04:35
Subject:
Re: ???strict??? strings?
Message ID:
CABMkAVXtG8E3AGczMgwsy7fUO2HOYex_tzqKC9p+Ljzan9XaHQ@mail.gmail.com
On Mon, Jan 6, 2020 at 11:30 PM Dan Book <grinnz@gmail.com> wrote:

> On Mon, Jan 6, 2020 at 11:24 PM Felipe Gasper <felipe@felipegasper.com>
> wrote:
>
>>
>> > On Jan 6, 2020, at 10:27 PM, Dan Book <grinnz@gmail.com> wrote:
>> >
>> > On Mon, Jan 6, 2020 at 10:15 PM Felipe Gasper <felipe@felipegasper.com>
>> wrote:
>> > But anyway, there’s no problem left to solve. Now that I see an example
>> of `perlunitut`’s workflow not working as I thought (decode() doesn’t set
>> UTF8 if the string is plain ASCII), I see why this won’t work. As I wrote a
>> bit ago, humble pie.
>> >
>> > I'd like to try to fix any misconceptions presented in such docs, but
>> looking at perlunitut, the only mention of the UTF8 flag is the vague
>> description of the internal format you shouldn't concern yourself with. The
>> document appears entirely correct, though it is mostly talking about usage
>> of text and binary strings (something you have to know in any language),
>> and not the Perl implementation details we're discussing. Did you perhaps
>> mean a different document?
>>
>> I did mean perlunitut, actually.
>>
>> I thought all explicit decode()s would set the UTF8 flag. That’s true for
>> Latin-1, but not for UTF-8/utf8/utf-8 when the string is pure ASCII. It
>> didn’t occur to me that there’d potentially be a difference between Latin-1
>> and UTF-8, so I just didn’t try it. I don’t know why there *is* a
>> difference :), but I’m sure there’s a good reason.
>>
>> The apparent fact that what Sereal and CBOR intend to do--distinguish
>> between text and binary strings--can’t reliably be done in Perl via
>> introspection, despite the presence of two widely-used encoders
>> (Sereal::Encoder and CBOR::XS) that attempt, via the UTF8 flag, to do
>> precisely that was/is also a point of confusion.
>>
>> It would have helped me if I’d seen somewhere--and maybe this is already
>> there but just in someplace that I missed--that all decode() operations do
>> not alter the string’s internal state differently, perhaps even with the
>> UTF-8 versus Latin-1 example.
>>
>
> I don't see what this has to do with perlunitut as it doesn't reference
> the string's internal state or the UTF8 flag at all, beyond the vague
> paragraph I mentioned. This sort of information belongs in perlunicode or
> perluniintro, which each have sections discussing these implementation
> details.
>

For reference, here are the places currently discussing these details, in
ascending order of nitty-gritty:

https://perldoc.pl/perluniintro#Perl's-Unicode-Model
https://perldoc.pl/perlunicode#Byte-and-Character-Semantics
https://perldoc.pl/perlguts#Unicode-Support
https://perldoc.pl/perlapi#Unicode-Support

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About