Front page | perl.perl5.porters |
Postings from May 2021
Docs: perluniintro
Thread Next
From:
Felipe Gasper
Date:
May 31, 2021 12:56
Subject:
Docs: perluniintro
Message ID:
CAD85683-F9B1-4883-9864-CDCA0AB4C8DD@felipegasper.com
Hi all,
I read over this doc the other day and would like to propose updating it in three significant ways:
1) MISINFORMATION: It misinforms (?!?) re outputting, saying that Perl outputs the internal raw bytes. This is *not* the case in any configuration I’ve found; upgraded strings are output as downgraded as long as the string content is all 0-255.
2) HISTORY: It’s got a lot of historical stuff re Perl 5.6 and 5.8. I’d like to redact all of this in favour of a parenthetical that just refers to older versions of the document for folks interested in such old perls.
3) NOMENCLATURE: I’d like to standardize on the terms “downgraded” and “upgraded” more generally as less confusion-prone ways to indicate the PV’s internal UTF8-ness.
That last one is a proposal for Perl’s Unicode documentation at large. I think a great amount of what used to confuse me about all of this arose from seeing stuff like this termed a “UTF-8 string”:
-----
SV = PV(0x7f8793804e60) at 0x7f8793816378
REFCNT = 1
FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK,UTF8)
PV = 0x7f8793410dc0 "\303\251"\0 [UTF8 "\x{e9}"]
CUR = 2
LEN = 10
COW_REFCNT = 0
-----
IMO we should call the above “upgraded” rather than “UTF8” in as many contexts as possible, to differentiate it from these:
-----
SV = PV(0x7fd071004e50) at 0x7fd07180ccd0
REFCNT = 1
FLAGS = (POK,pPOK)
PV = 0x7fd070e003c0 "\303\251"\0
CUR = 2
LEN = 10
SV = PV(0x7fe631004c70) at 0x7fe6310162d0
REFCNT = 1
FLAGS = (POK,pPOK,UTF8)
PV = 0x7fe630c04950 "\303\203\302\251"\0 [UTF8 "\x{c3}\x{a9}"]
CUR = 4
LEN = 10
-----
Thoughts?
-FG
Thread Next
-
Docs: perluniintro
by Felipe Gasper