develooper Front page | perl.perl5.porters | Postings from February 2022

Re: RFC: Rename the “UTF8” flag

Thread Previous | Thread Next
From:
Leon Timmermans
Date:
February 4, 2022 16:28
Subject:
Re: RFC: Rename the “UTF8” flag
Message ID:
CAHhgV8jNB1fqKUXnvkySZdGFbrL2UOjw893g-4DQprSiR85uKw@mail.gmail.com
On Fri, Feb 4, 2022 at 4:42 PM Felipe Gasper <felipe@felipegasper.com>
wrote:

>
>
> > On Feb 4, 2022, at 09:31, Ricardo Signes <perl.p5p@rjbs.manxome.org>
> wrote:
> >
> > On Thu, Feb 3, 2022, at 10:05 PM, Felipe Gasper wrote:
> >> Tony wrote:
> >> > The UTF8 flag does what it says on the box - indicates the PV is
> >> > encoded using (something like) UTF-8.
> >>
> >> Oof. If a pure-Perl user read that description, do you see how that
> person would reasonably reach for utf8::is_utf8()?
> >
> > Just to call out this one point:  I think that there's a distinction to
> be drawn between the SvUTF8 flag and utf8::is_utf8.  I get that it's nice
> to have "is-X" match the "X" flag, but I here, I think it's a bit of a
> complicated pain in the butt.
> >
> > That said, I also think that it's utf8::is_utf8 that leads to the mass
> of confusion.  Providing a "is this string stored in internal format A or
> B" builtin to use _instead_ is a better idea.  I would support something
> more like:
> >       • provide builtin::internal_string_format that returns 'blue' or
> 'green'
> >       • discourage using utf8::is_utf8, explaining "it's not what you
> think it is"
> >       • leave the SV flags how they are
>
> Would Internals:: suit it better, since the idea is that Perl applications
> shouldn’t normally use this?
>
> In that same vein, utf8::upgrade() and utf8::downgrade() could gainfully
> be renamed to, e.g., Internals::utf8_upgrade() and
> Internals::utf8_downgrade().
>
> I think utf8::is_utf8() is a symptom. The root problem is the
> double-whammy that: a) pure-Perl applications *can* need to know Perl’s
> internals, and b) the same term is used for that internal encoding as for
> application-level encoding.
>
> From the pure-Perl side, Perl’s internals are only relevant nowadays
> because exec et al. pass the raw PV to the OS. Compare these:
>
> > perl -Mutf8 -e'print "é"' | xxd
> 00000000: e9
>
> > perl -Mutf8 -e'exec echo => "-n", "é"' | xxd
> 00000000: c3a9
>
> If we fixed *that*, via some new feature-bundle-included pragma, then Perl
> internals would no longer be relevant for pure-Perl devs. I have a PoC CPAN
> module, Sys::Binmode, that does this by utf8-downgrading all strings prior
> to giving them to the OS. What if Perl had something like `use
> syscall::encoding 'bytes'`?
>
> Yes, there’d still be old posts that talk about `use bytes` and what not,
> but we’d at least be able to say “modern Perl fixes all that; update, and
> be happy.” Right now we can’t, which makes explaining all of this stuff
> much trickier than IMO it should be. That complicates Perl advocacy in
> general, since one of Perl’s “claims to fame” is being a premier
> text-processing tool.
>

Internals:: is for the things that are deliberately not documented, and
comes with no guarantees of backwards compatibility. It is not appropriate
for any of the things we're discussing here.

Leon

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About