Front page | perl.perl5.porters |
Postings from May 2021
Re: Docs: perluniintro
Thread Previous
|
Thread Next
From:
Dan Book
Date:
May 31, 2021 15:07
Subject:
Re: Docs: perluniintro
Message ID:
CABMkAVXOwDAYBNpbJK6oiTUeEgZN_0Lmx=zoQfuYALqpYy-z3g@mail.gmail.com
On Mon, May 31, 2021 at 8:56 AM Felipe Gasper <felipe@felipegasper.com>
wrote:
> Hi all,
>
> I read over this doc the other day and would like to propose
> updating it in three significant ways:
>
> 1) MISINFORMATION: It misinforms (?!?) re outputting, saying that Perl
> outputs the internal raw bytes. This is *not* the case in any configuration
> I’ve found; upgraded strings are output as downgraded as long as the string
> content is all 0-255.
>
> 2) HISTORY: It’s got a lot of historical stuff re Perl 5.6 and 5.8. I’d
> like to redact all of this in favour of a parenthetical that just refers to
> older versions of the document for folks interested in such old perls.
>
> 3) NOMENCLATURE: I’d like to standardize on the terms “downgraded” and
> “upgraded” more generally as less confusion-prone ways to indicate the PV’s
> internal UTF8-ness.
>
> That last one is a proposal for Perl’s Unicode documentation at large. I
> think a great amount of what used to confuse me about all of this arose
> from seeing stuff like this termed a “UTF-8 string”:
>
> -----
> SV = PV(0x7f8793804e60) at 0x7f8793816378
> REFCNT = 1
> FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK,UTF8)
> PV = 0x7f8793410dc0 "\303\251"\0 [UTF8 "\x{e9}"]
> CUR = 2
> LEN = 10
> COW_REFCNT = 0
> -----
>
> IMO we should call the above “upgraded” rather than “UTF8” in as many
> contexts as possible, to differentiate it from these:
>
> -----
> SV = PV(0x7fd071004e50) at 0x7fd07180ccd0
> REFCNT = 1
> FLAGS = (POK,pPOK)
> PV = 0x7fd070e003c0 "\303\251"\0
> CUR = 2
> LEN = 10
>
> SV = PV(0x7fe631004c70) at 0x7fe6310162d0
> REFCNT = 1
> FLAGS = (POK,pPOK,UTF8)
> PV = 0x7fe630c04950 "\303\203\302\251"\0 [UTF8 "\x{c3}\x{a9}"]
> CUR = 4
> LEN = 10
> -----
>
> Thoughts?
+1
-Dan
Thread Previous
|
Thread Next