On Tue, Feb 1, 2022 at 2:58 PM Karl Williamson <public@khwilliamson.com> wrote: > On 2/1/22 11:59, Dan Book wrote: > > On Tue, Feb 1, 2022 at 1:20 PM Joseph Brenner <doomvox@gmail.com > > <mailto:doomvox@gmail.com>> wrote: > > > > Felipe Gasper <felipe@felipegasper.com > > <mailto:felipe@felipegasper.com>> wrote: > > > > > Then thereâs utf8::is_utf8(), which, for pure-Perl code, usually > > means the *opposite* of what it looks like it means. THIS. IS. > > MADNESS. No one groks it all without investing *significant* effort. > > > > I can see how a global rename throughout the internals could be a lot > > of work, I was going to make the point that in my experience the > place > > where the confusion hits client programmers is "is_utf8". It gets > > used wrong a lot, to the point where when I see it used I'm not sure > > what I should think-- is this a code smell, or is this one of the few > > who really gets what it means (and I'm still not sure I do). > > > > Modest proposal: add an alias to is_utf8 to something else, e.g. > > "is_heavy" (I think I'd prefer "is_modern" but that's not without > > issues.) Then encourage the use of the new form, and possible > > deprecate the old one. > > > > > > This is a bit off topic but specifically on utf8::is_utf8: > > > > I would prefer is_upgraded since that is the only consistent terminology > > that has been used externally other than the misleading utf8 bit's name. > > I don't quite get the objection to upgraded/downgraded as terms, and > > heavy doesn't seem distinct enough terminology. > > > > Note that upgraded strings are not "the new form" - both forms of > > strings are still used in modern code. Downgraded string operations are > > more efficient when usable, and byte strings should always be downgraded > > (but should function correctly when upgraded as well). > > I hate the word "upgraded" for our uses > > From https://www.google.com/search?q=define+upgraded > > "upgraded adjective > > improved by the addition or replacement of components; raised to a > higher standard." > > Just what is it about a UTF8-encoded string that makes it better than a > non- one? What is it about an SVt_PV makes it better than an SVt_IV? > > The answer is nothing. Our so called "upgraded" forms are not to a > "higher standard" than the non-upgraded ones. > > We use this word to mean something different than its standard usage. > That lowers efficiency of maintenance. It still causes me pause > whenever I see these on-English uses. We may be stuck with the poor > choices of wording that were made earlier in the project; but we > shouldn't add more misery either. > It's upgraded because it is capable of storing more than 256 codepoints. This seems fairly straightforward to me. I don't see a better alternative when taking into account that upgrade and downgrade are already commonly named operations for manipulating this storage. -DanThread Previous | Thread Next