develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
From:
Dan Book
Date:
February 1, 2022 20:13
Subject:
Re: RFC: Rename the “UTF8” flag
Message ID:
CABMkAVU+AHO29210pH686L6OaNHKUNOW-vT9Sc4h8M2AOUvS8A@mail.gmail.com
On Tue, Feb 1, 2022 at 2:58 PM Karl Williamson <public@khwilliamson.com>
wrote:

> On 2/1/22 11:59, Dan Book wrote:
> > On Tue, Feb 1, 2022 at 1:20 PM Joseph Brenner <doomvox@gmail.com
> > <mailto:doomvox@gmail.com>> wrote:
> >
> >              Felipe Gasper <felipe@felipegasper.com
> >     <mailto:felipe@felipegasper.com>> wrote:
> >
> >      > Then there’s utf8::is_utf8(), which, for pure-Perl code, usually
> >     means the *opposite* of what it looks like it means. THIS. IS.
> >     MADNESS. No one groks it all without investing *significant* effort.
> >
> >     I can see how a global rename throughout the internals could be a lot
> >     of work, I was going to make the point that in my experience the
> place
> >     where the confusion hits client programmers is "is_utf8".  It gets
> >     used wrong a lot, to the point where when I see it used I'm not sure
> >     what I should think-- is this a code smell, or is this one of the few
> >     who really gets what it means (and I'm still not sure I do).
> >
> >     Modest proposal:  add an alias to is_utf8 to something else, e.g.
> >     "is_heavy"  (I think I'd prefer "is_modern" but that's not without
> >     issues.)  Then encourage the use of the new form, and possible
> >     deprecate the old one.
> >
> >
> > This is a bit off topic but specifically on utf8::is_utf8:
> >
> > I would prefer is_upgraded since that is the only consistent terminology
> > that has been used externally other than the misleading utf8 bit's name.
> > I don't quite get the objection to upgraded/downgraded as terms, and
> > heavy doesn't seem distinct enough terminology.
> >
> > Note that upgraded strings are not "the new form" - both forms of
> > strings are still used in modern code. Downgraded string operations are
> > more efficient when usable, and byte strings should always be downgraded
> > (but should function correctly when upgraded as well).
>
> I hate the word "upgraded" for our uses
>
>  From https://www.google.com/search?q=define+upgraded
>
> "upgraded  adjective
>
> improved by the addition or replacement of components; raised to a
> higher standard."
>
> Just what is it about a UTF8-encoded string that makes it better than a
> non- one?  What is it about an SVt_PV makes it better than an SVt_IV?
>
> The answer is nothing.  Our so called "upgraded" forms are not to a
> "higher standard" than the non-upgraded ones.
>
> We use this word to mean something different than its standard usage.
> That lowers efficiency of maintenance.  It still causes me pause
> whenever I see these on-English uses.  We may be stuck with the poor
> choices of wording that were made earlier in the project; but we
> shouldn't add more misery either.
>

It's upgraded because it is capable of storing more than 256 codepoints.
This seems fairly straightforward to me.

I don't see a better alternative when taking into account that upgrade and
downgrade are already commonly named operations for manipulating this
storage.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About