develooper Front page | perl.perl5.porters | Postings from January 2020

Re: ???strict??? strings?

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
January 7, 2020 04:24
Subject:
Re: ???strict??? strings?
Message ID:
E38C68C8-40ED-4730-8591-261B286DC443@felipegasper.com

> On Jan 6, 2020, at 10:27 PM, Dan Book <grinnz@gmail.com> wrote:
> 
> On Mon, Jan 6, 2020 at 10:15 PM Felipe Gasper <felipe@felipegasper.com> wrote:
> But anyway, there’s no problem left to solve. Now that I see an example of `perlunitut`’s workflow not working as I thought (decode() doesn’t set UTF8 if the string is plain ASCII), I see why this won’t work. As I wrote a bit ago, humble pie.
> 
> I'd like to try to fix any misconceptions presented in such docs, but looking at perlunitut, the only mention of the UTF8 flag is the vague description of the internal format you shouldn't concern yourself with. The document appears entirely correct, though it is mostly talking about usage of text and binary strings (something you have to know in any language), and not the Perl implementation details we're discussing. Did you perhaps mean a different document?

I did mean perlunitut, actually.

I thought all explicit decode()s would set the UTF8 flag. That’s true for Latin-1, but not for UTF-8/utf8/utf-8 when the string is pure ASCII. It didn’t occur to me that there’d potentially be a difference between Latin-1 and UTF-8, so I just didn’t try it. I don’t know why there *is* a difference :), but I’m sure there’s a good reason.

The apparent fact that what Sereal and CBOR intend to do--distinguish between text and binary strings--can’t reliably be done in Perl via introspection, despite the presence of two widely-used encoders (Sereal::Encoder and CBOR::XS) that attempt, via the UTF8 flag, to do precisely that was/is also a point of confusion.

It would have helped me if I’d seen somewhere--and maybe this is already there but just in someplace that I missed--that all decode() operations do not alter the string’s internal state differently, perhaps even with the UTF-8 versus Latin-1 example.

-F
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About