develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
Leon Timmermans
February 1, 2022 14:58
Re: RFC: Rename the “UTF8” flag
Message ID:
On Sat, Jan 29, 2022 at 2:14 PM Felipe Gasper <>

> > And I don't see that benefit. As far as I can tell, the argument is
> really more about ideological purity than any practical advantages.
> The practical advantages are clarity and correctness. Around $work I’m
> “the guy who understands encodings”, and it takes a *long* time to explain
> this stuff. Having terminology that properly differentiates Perl-internal
> encoding from Perl-caller-visible “UTF-8” will manifoldly simplify those
> discussions.

I really don't think things will be more clear if things suddenly have two
names. I don't quite see the correctness advantages either. I don't think
this new terminology properly differentiates things either.

> It will thus be easier for everyone--API maintainers as well as
> callers--to distinguish the external-facing stuff (“utf-?8”) from the
> internal-facing (“heavy”). Part of this could include, actually, postfixing
> descriptions of SvHEAVY with a disclaimer about the abstraction leak that
> it entails.

I'm genuinely puzzled why you would think "heavy" would be a good name for
internal-facing stuff; I had assumed you meant things the other way around.
Internal-facing stuff is exactly what should care about internal encoding.

Also, the entire problem with the perlapi is that there is very little
distinction between external-facing and internal-facing interfaces; the
external ones are also used internally, and often the other way around too.
I don't think this distinction that you think you see exists in reality to
the extent that you are assuming it exists (unfortunately).

Would you mind explaining further where/when it is necessary to probe the
> string-storage abstraction?

Literally every string-builtin in core. I'm guessing Yves' resistance to
this is rooted in his extensive experience in the regex engine for example.

> In my own experience, FWIW, it is entirely possible when using Perl’s C
> API to avoid assumptions about how an SV stores its code points internally.
> Yes, you can call SvUTF8* macros and such, but the safer approach of doing
> actual encode/decode operations and SvPVbyte/SvPVutf8 preserves the
> abstraction.

In XS code, generally yes. In core, absolutely not.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About