develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous
From:
Felipe Gasper
Date:
August 18, 2021 20:25
Subject:
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:
A414DDF1-3FCA-4F73-89BD-3C2793D313F6@felipegasper.com


> On Aug 18, 2021, at 4:19 PM, Dan Book <grinnz@gmail.com> wrote:
> 
> On Wed, Aug 18, 2021 at 4:13 PM Karl Williamson <public@khwilliamson.com> wrote:
> On 8/18/21 2:08 PM, Dan Book wrote:
> > On Wed, Aug 18, 2021 at 3:50 PM Tomasz Konojacki <me@xenu.pl 
> > <mailto:me@xenu.pl>> wrote:
> > 
> >     On Wed, 18 Aug 2021 13:18:34 -0400
> >     Felipe Gasper <felipe@felipegasper.com
> >     <mailto:felipe@felipegasper.com>> wrote:
> > 
> >      > Per recent IRC discussion …
> >      >
> >      > PROBLEM: The naming of Perl’s “UTF-8 flag” is a continual source
> >     of confusion regarding the flag’s significance. Some think it
> >     indicates whether a given PV stores text versus binary. Some think
> >     it means that the PV is valid UTF-8. Still others likely hold other
> >     inaccurate views.
> >      >
> >      > The problem here is the naming. For example, consider `perl -e'my
> >     $foo = "é"'`. In this code $foo is a “UTF-8 string” by virtue of the
> >     fact that its code points (assuming use of a UTF-8 terminal)
> >     correspond to the bytes that encode “é” in UTF-8. The “UTF-8 flag”,
> >     however, is likely *not* set on this string. By contrast, consider
> >     `perl -Mutf8 -e'my $foo = "é"'`. Here $foo has the “UTF-8 flag” set,
> >     but $foo is NOT a “UTF-8 string” because its code points (in this
> >     case, only 1) aren’t valid UTF-8.
> >      >
> >      > The fact that quite often a “UTF-8 string” lacks the “UTF-8
> >     flag”, and a “UTF-8-flagged” string is (usually) *not* a “UTF-8
> >     string”, makes little sense except to the “highly initiated”.
> >      >
> >      > Another problem is “UTF-8” doesn’t really describe the “upgraded”
> >     format. This format is what Perl historically called “lax UTF-8” and
> >     is now widely called “generalized UTF-8”, which includes unpaired
> >     surrogates and code points above Unicode’s maximum.
> >      >
> >      > PROPOSAL: Rename the following identifiers in code and
> >     documentation, leaving macros for the old ones as aliases:
> >      > - SVf_UTF8        -> SVf_PVUPGRADED
> >      > - SvUTF8          -> Sv_PVUPGRADED
> >      > - SvUTF8_on       -> Sv_PVUPGRADED_on
> >      > - SvUTF8_off      -> Sv_PVUPGRADED_off
> >      > - SvPOK_only_UTF8 -> SvPOK_only_UPGRADED
> >      >
> >      > Note that flags like REFCOUNTED_HE_KEY_UTF8 do not need a rename
> >     because these indicate an actual (if incomplete/invalidated) UTF-8
> >     decoding step.
> >      >
> >      > BENEFITS: Over time, this rename will minimize the confusion
> >     between Perl’s upgraded-PV storage format versus UTF-8. The rename
> >     may also compel current users of the language who hold mistaken
> >     mental models of the flag’s purpose to reexamine their
> >     understanding, hopefully for the better.
> >      >
> >      > POTENTIAL COMPLICATIONS: The mismatch between amended
> >     documentation and existing documentation may cause confusion; it
> >     should, though, be an auspicious confusion that eventually clarifies
> >     rather than misleads.
> > 
> >     utf8::is_utf8 probably should be renamed too. Anyway, +1 from me.
> > 
> > Frankly it (and upgrade/downgrade) shouldn't even be in the utf8:: 
> > namespace, it's named that for internal reasons not interface reasons.
> > 
> > -Dan
> 
> Upgrade and downgrade tell me nothing.  I don't object to renaming, but 
> something better than these needs to be found
> 
> It is related to the two possible string formats. Do you know of any other name for them than UTF8/non-UTF8 (which is a misleading name to expose to the logical string layer, which may separately be UTF-8 encoded or not) or upgraded/downgraded?

RJBS called it “the wide flag” in a presentation some years back. SVf_WIDEPV may clash with the “wide character” warning, though.

SVf_BIGPV?

-F


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About