develooper Front page | perl.perl5.porters | Postings from August 2021

Re: Pre-RFC: Rename SVf_UTF8 et al.

Thread Previous | Thread Next
Felipe Gasper
August 30, 2021 14:22
Re: Pre-RFC: Rename SVf_UTF8 et al.
Message ID:

> On Aug 30, 2021, at 8:18 AM, Dave Mitchell <> wrote:
> Date: Mon, 30 Aug 2021 13:17:04 +0100
> From: Dave Mitchell <>
> To: Felipe Gasper <>
> Subject: Re: Pre-RFC: Rename SVf_UTF8 et al.
> Message-ID: <YSzMQJIeURS/>
> On Wed, Aug 18, 2021 at 01:18:34PM -0400, Felipe Gasper wrote:
>> PROBLEM: The naming of Perl’s “UTF-8 flag” is a continual source of confusion regarding the flag’s significance.
> The SVf_UTF8 flags has a clear and unambiguous meaning (apart from some
> historical bugs): in what manner the codepoints of a string are stored as
> a sequence of bytes in memory.
> If people are confused by this, renaming it is only going to add to the
> cognitive load and confusion.

I’ve proposed some fixes for perlre.pod ( These fix documentation bugs that crept in specifically because of the use of “UTF-8” to refer to “upgraded” strings. It confuses even Perl’s own maintainers.

The fact that “UTF-8 string” can mean two quite-different things causes lots of encoding bugs in the wild. The fact that Perl *can’t* help to fix these worsens the problem.

Ricardo sensed a problem here back in 2016:

… when he referred to the flag as WIDE, in part because the encoding in question is *not*, in fact, UTF-8. Then he said: “Some joker went ahead, and they called that the UTF-8 flag.” Chuckles ensued.

Benefits of changing the internal terminology:

- It clarifies “external”, Perl-visible encoding versus internal codepoint storage. Different terms for different things.
- More abstract terminology for the internals discourages folks from peeking behind the abstraction.
- It’s more correct. Proper UTF-8 forbids quite a lot that Perl’s “lax UTF-8” (by design) allows.

Thanks for reading.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About