develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
From:
Karl Williamson
Date:
February 1, 2022 19:58
Subject:
Re: RFC: Rename the “UTF8” flag
Message ID:
3066b5d5-978d-fdee-f9bf-59bd26763623@khwilliamson.com
On 2/1/22 11:59, Dan Book wrote:
> On Tue, Feb 1, 2022 at 1:20 PM Joseph Brenner <doomvox@gmail.com 
> <mailto:doomvox@gmail.com>> wrote:
> 
>              Felipe Gasper <felipe@felipegasper.com
>     <mailto:felipe@felipegasper.com>> wrote:
> 
>      > Then there’s utf8::is_utf8(), which, for pure-Perl code, usually
>     means the *opposite* of what it looks like it means. THIS. IS.
>     MADNESS. No one groks it all without investing *significant* effort.
> 
>     I can see how a global rename throughout the internals could be a lot
>     of work, I was going to make the point that in my experience the place
>     where the confusion hits client programmers is "is_utf8".  It gets
>     used wrong a lot, to the point where when I see it used I'm not sure
>     what I should think-- is this a code smell, or is this one of the few
>     who really gets what it means (and I'm still not sure I do).
> 
>     Modest proposal:  add an alias to is_utf8 to something else, e.g.
>     "is_heavy"  (I think I'd prefer "is_modern" but that's not without
>     issues.)  Then encourage the use of the new form, and possible
>     deprecate the old one.
> 
> 
> This is a bit off topic but specifically on utf8::is_utf8:
> 
> I would prefer is_upgraded since that is the only consistent terminology 
> that has been used externally other than the misleading utf8 bit's name. 
> I don't quite get the objection to upgraded/downgraded as terms, and 
> heavy doesn't seem distinct enough terminology.
> 
> Note that upgraded strings are not "the new form" - both forms of 
> strings are still used in modern code. Downgraded string operations are 
> more efficient when usable, and byte strings should always be downgraded 
> (but should function correctly when upgraded as well).

I hate the word "upgraded" for our uses

 From https://www.google.com/search?q=define+upgraded

"upgraded  adjective

improved by the addition or replacement of components; raised to a 
higher standard."

Just what is it about a UTF8-encoded string that makes it better than a 
non- one?  What is it about an SVt_PV makes it better than an SVt_IV?

The answer is nothing.  Our so called "upgraded" forms are not to a 
"higher standard" than the non-upgraded ones.

We use this word to mean something different than its standard usage. 
That lowers efficiency of maintenance.  It still causes me pause 
whenever I see these on-English uses.  We may be stuck with the poor 
choices of wording that were made earlier in the project; but we 
shouldn't add more misery either.

"Heavy" in my experience has been used to mean something that is 
complicated and/or slow that we strive to avoid when possible.  So 
utf8_heavy.pl, before it was removed, was for the heavy lifting of going 
out to disk to gather the necessary data, and we played games to defer 
it until absolutely necessary.

I also don't buy the argument that adding synonyms doubles the cognitive 
load.  A new good word that the core converts to drives out use of the 
old worse one.  Newcomers may never come across the old word, and have 
the advantage of something that isn't misleading.  The catch is the new 
word must be clearly better.





Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About