Re: RFC: Rename the "UTF8" flag

Leon Timmermans
February 4, 2022 16:28
Re: RFC: Rename the “UTF8” flag
Message ID:
On Fri, Feb 4, 2022 at 4:42 PM Felipe Gasper <>

> > On Feb 4, 2022, at 09:31, Ricardo Signes <>
> wrote:
> >
> > On Thu, Feb 3, 2022, at 10:05 PM, Felipe Gasper wrote:
> >> Tony wrote:
> >> > The UTF8 flag does what it says on the box - indicates the PV is
> >> > encoded using (something like) UTF-8.
> >>
> >> Oof. If a pure-Perl user read that description, do you see how that
> person would reasonably reach for utf8::is_utf8()?
> >
> > Just to call out this one point:  I think that there's a distinction to
> be drawn between the SvUTF8 flag and utf8::is_utf8.  I get that it's nice
> to have "is-X" match the "X" flag, but I here, I think it's a bit of a
> complicated pain in the butt.
> >
> > That said, I also think that it's utf8::is_utf8 that leads to the mass
> of confusion.  Providing a "is this string stored in internal format A or
> B" builtin to use _instead_ is a better idea.  I would support something
> more like:
> >       • provide builtin::internal_string_format that returns 'blue' or
> 'green'
> >       • discourage using utf8::is_utf8, explaining "it's not what you
> think it is"
> >       • leave the SV flags how they are
> Would Internals:: suit it better, since the idea is that Perl applications
> shouldn’t normally use this?
> In that same vein, utf8::upgrade() and utf8::downgrade() could gainfully
> be renamed to, e.g., Internals::utf8_upgrade() and
> Internals::utf8_downgrade().
> I think utf8::is_utf8() is a symptom. The root problem is the
> double-whammy that: a) pure-Perl applications *can* need to know Perl’s
> internals, and b) the same term is used for that internal encoding as for
> application-level encoding.
> From the pure-Perl side, Perl’s internals are only relevant nowadays
> because exec et al. pass the raw PV to the OS. Compare these:
> > perl -Mutf8 -e'print "é"' | xxd
> 00000000: e9
> > perl -Mutf8 -e'exec echo => "-n", "é"' | xxd
> 00000000: c3a9
> If we fixed *that*, via some new feature-bundle-included pragma, then Perl
> internals would no longer be relevant for pure-Perl devs. I have a PoC CPAN
> module, Sys::Binmode, that does this by utf8-downgrading all strings prior
> to giving them to the OS. What if Perl had something like `use
> syscall::encoding 'bytes'`?
> Yes, there’d still be old posts that talk about `use bytes` and what not,
> but we’d at least be able to say “modern Perl fixes all that; update, and
> be happy.” Right now we can’t, which makes explaining all of this stuff
> much trickier than IMO it should be. That complicates Perl advocacy in
> general, since one of Perl’s “claims to fame” is being a premier
> text-processing tool.

Internals:: is for the things that are deliberately not documented, and
comes with no guarantees of backwards compatibility. It is not appropriate
for any of the things we're discussing here.


