Front page | perl.perl5.porters |
Postings from January 2020
Re: ???strict??? strings?
Thread Previous
|
Thread Next
From:
Dan Book
Date:
January 7, 2020 04:31
Subject:
Re: ???strict??? strings?
Message ID:
CABMkAVWtPnfy9i+bC_XgOvf2aUYrH9DUHkBDu_KWSVKqhF9sVA@mail.gmail.com
On Mon, Jan 6, 2020 at 11:24 PM Felipe Gasper <felipe@felipegasper.com>
wrote:
>
> > On Jan 6, 2020, at 10:27 PM, Dan Book <grinnz@gmail.com> wrote:
> >
> > On Mon, Jan 6, 2020 at 10:15 PM Felipe Gasper <felipe@felipegasper.com>
> wrote:
> > But anyway, thereâs no problem left to solve. Now that I see an example
> of `perlunitut`âs workflow not working as I thought (decode() doesnât set
> UTF8 if the string is plain ASCII), I see why this wonât work. As I wrote a
> bit ago, humble pie.
> >
> > I'd like to try to fix any misconceptions presented in such docs, but
> looking at perlunitut, the only mention of the UTF8 flag is the vague
> description of the internal format you shouldn't concern yourself with. The
> document appears entirely correct, though it is mostly talking about usage
> of text and binary strings (something you have to know in any language),
> and not the Perl implementation details we're discussing. Did you perhaps
> mean a different document?
>
> I did mean perlunitut, actually.
>
> I thought all explicit decode()s would set the UTF8 flag. Thatâs true for
> Latin-1, but not for UTF-8/utf8/utf-8 when the string is pure ASCII. It
> didnât occur to me that thereâd potentially be a difference between Latin-1
> and UTF-8, so I just didnât try it. I donât know why there *is* a
> difference :), but Iâm sure thereâs a good reason.
>
> The apparent fact that what Sereal and CBOR intend to do--distinguish
> between text and binary strings--canât reliably be done in Perl via
> introspection, despite the presence of two widely-used encoders
> (Sereal::Encoder and CBOR::XS) that attempt, via the UTF8 flag, to do
> precisely that was/is also a point of confusion.
>
> It would have helped me if Iâd seen somewhere--and maybe this is already
> there but just in someplace that I missed--that all decode() operations do
> not alter the stringâs internal state differently, perhaps even with the
> UTF-8 versus Latin-1 example.
>
I don't see what this has to do with perlunitut as it doesn't reference the
string's internal state or the UTF8 flag at all, beyond the vague paragraph
I mentioned. This sort of information belongs in perlunicode or
perluniintro, which each have sections discussing these implementation
details.
-Dan
Thread Previous
|
Thread Next