develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
From:
Felipe Gasper
Date:
February 1, 2022 15:57
Subject:
Re: RFC: Rename the “UTF8” flag
Message ID:
E051E34B-DAE8-4CDA-8BD6-3AF0617FFACD@felipegasper.com

> On Feb 1, 2022, at 10:08, Arne Johannessen <aj22@thaw.de> wrote:
> 
> According to the Perl documentation (e. g. perlunifaq), Perl does use UTF-8 internally. If the documentation is wrong, let's fix that first.
> 
> I'm ignoring the difference between Perl's "lax" utf8 and "strict" UTF-8 here, as it's hardly relevant in practice. It is also well explained in the docs.

Regardless of accuracy, if neither Perl applications nor XS authors need the information, then it seems better removed. Perl’s documentation is voluminous, which makes it imposing, which discourages language adoption.

> I agree that the ambiguity of the term "UTF-8 string" is problematic.
> 
> However, if this proposal were to be implemented, I'd expect the term "UTF-8 string" would still be commonly used out of habit, and it would *still* be ambiguous. If people wished to express themselves clearly today, they could do so. But they usually don't. Switching one name for another doesn't address this problem.

Change won’t happen overnight. But at least it *could* happen with the rename. “Heavy” can be a bit like how Damian encouraged \A and \z in regexps: the esoteric quality compels readers to look up what they mean and thus understand things better. Except unlike \A and \z, the things I propose be named “heavy” are things we *want* Perl programmers and XS authors to ignore, while the stuff labeled “utf8” will be for public use. Thus, the “right” controls for callers will be the more familiar-looking ones, and the “wrong” ones will have funny, Perl-specific names. Doesn’t that seem like an improvement?

Right now, as I wrote in response to Dave, it’s unreasonably hard to juggle the meanings of all the “utf8”-named controls. (Once you “climb the mountain” it all makes sense, sure, but it’s a long ascent!) Let’s rename utf8::upgrade as Internals::sv2heavy, utf8::is_utf8 as Internals::is_heavy, etc. SvUTF8_on can be SvHEAVY_on, and so forth.

> XS code that wants to remain backwards-compatible with Perl v5.36 and earlier would have to keep using the old names anyway though, right?

Wouldn’t ppport.h make the newer names work in older perls?

-F

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About