develooper Front page | perl.perl5.porters | Postings from May 2021

Docs: perluniintro

Thread Next
From:
Felipe Gasper
Date:
May 31, 2021 12:56
Subject:
Docs: perluniintro
Message ID:
CAD85683-F9B1-4883-9864-CDCA0AB4C8DD@felipegasper.com
Hi all,

	I read over this doc the other day and would like to propose updating it in three significant ways:

1) MISINFORMATION: It misinforms (?!?) re outputting, saying that Perl outputs the internal raw bytes. This is *not* the case in any configuration I’ve found; upgraded strings are output as downgraded as long as the string content is all 0-255.

2) HISTORY: It’s got a lot of historical stuff re Perl 5.6 and 5.8. I’d like to redact all of this in favour of a parenthetical that just refers to older versions of the document for folks interested in such old perls.

3) NOMENCLATURE: I’d like to standardize on the terms “downgraded” and “upgraded” more generally as less confusion-prone ways to indicate the PV’s internal UTF8-ness.

That last one is a proposal for Perl’s Unicode documentation at large. I think a great amount of what used to confuse me about all of this arose from seeing stuff like this termed a “UTF-8 string”:

-----
SV = PV(0x7f8793804e60) at 0x7f8793816378
  REFCNT = 1
  FLAGS = (POK,IsCOW,READONLY,PROTECT,pPOK,UTF8)
  PV = 0x7f8793410dc0 "\303\251"\0 [UTF8 "\x{e9}"]
  CUR = 2
  LEN = 10
  COW_REFCNT = 0
-----

IMO we should call the above “upgraded” rather than “UTF8” in as many contexts as possible, to differentiate it from these:

-----
SV = PV(0x7fd071004e50) at 0x7fd07180ccd0
  REFCNT = 1
  FLAGS = (POK,pPOK)
  PV = 0x7fd070e003c0 "\303\251"\0
  CUR = 2
  LEN = 10

SV = PV(0x7fe631004c70) at 0x7fe6310162d0
  REFCNT = 1
  FLAGS = (POK,pPOK,UTF8)
  PV = 0x7fe630c04950 "\303\203\302\251"\0 [UTF8 "\x{c3}\x{a9}"]
  CUR = 4
  LEN = 10
-----

Thoughts?

-FG
Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About