develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
Dan Book
February 1, 2022 20:13
Re: RFC: Rename the “UTF8” flag
Message ID:
On Tue, Feb 1, 2022 at 2:58 PM Karl Williamson <>

> On 2/1/22 11:59, Dan Book wrote:
> > On Tue, Feb 1, 2022 at 1:20 PM Joseph Brenner <
> > <>> wrote:
> >
> >              Felipe Gasper <
> >     <>> wrote:
> >
> >      > Then there’s utf8::is_utf8(), which, for pure-Perl code, usually
> >     means the *opposite* of what it looks like it means. THIS. IS.
> >     MADNESS. No one groks it all without investing *significant* effort.
> >
> >     I can see how a global rename throughout the internals could be a lot
> >     of work, I was going to make the point that in my experience the
> place
> >     where the confusion hits client programmers is "is_utf8".  It gets
> >     used wrong a lot, to the point where when I see it used I'm not sure
> >     what I should think-- is this a code smell, or is this one of the few
> >     who really gets what it means (and I'm still not sure I do).
> >
> >     Modest proposal:  add an alias to is_utf8 to something else, e.g.
> >     "is_heavy"  (I think I'd prefer "is_modern" but that's not without
> >     issues.)  Then encourage the use of the new form, and possible
> >     deprecate the old one.
> >
> >
> > This is a bit off topic but specifically on utf8::is_utf8:
> >
> > I would prefer is_upgraded since that is the only consistent terminology
> > that has been used externally other than the misleading utf8 bit's name.
> > I don't quite get the objection to upgraded/downgraded as terms, and
> > heavy doesn't seem distinct enough terminology.
> >
> > Note that upgraded strings are not "the new form" - both forms of
> > strings are still used in modern code. Downgraded string operations are
> > more efficient when usable, and byte strings should always be downgraded
> > (but should function correctly when upgraded as well).
> I hate the word "upgraded" for our uses
>  From
> "upgraded  adjective
> improved by the addition or replacement of components; raised to a
> higher standard."
> Just what is it about a UTF8-encoded string that makes it better than a
> non- one?  What is it about an SVt_PV makes it better than an SVt_IV?
> The answer is nothing.  Our so called "upgraded" forms are not to a
> "higher standard" than the non-upgraded ones.
> We use this word to mean something different than its standard usage.
> That lowers efficiency of maintenance.  It still causes me pause
> whenever I see these on-English uses.  We may be stuck with the poor
> choices of wording that were made earlier in the project; but we
> shouldn't add more misery either.

It's upgraded because it is capable of storing more than 256 codepoints.
This seems fairly straightforward to me.

I don't see a better alternative when taking into account that upgrade and
downgrade are already commonly named operations for manipulating this


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About