Front page | perl.perl5.porters |
Postings from August 2021
Re: "use v5.36.0" should imply UTF-8 encoded source
Thread Previous
|
Thread Next
From:
Tom Molesworth via perl5-porters
Date:
August 2, 2021 17:23
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
CAGXhHdmrMNtwkTFpz-Ba+VYztMXBqkGHtLWigAKHp5CYWrafJg@mail.gmail.com
On Tue, 3 Aug 2021 at 01:17, Felipe Gasper <felipe@felipegasper.com> wrote:
>
>
> > On Aug 2, 2021, at 11:53 AM, Dan Book <grinnz@gmail.com> wrote:
> >
> > On Mon, Aug 2, 2021 at 11:31 AM Felipe Gasper <felipe@felipegasper.com>
> wrote:
> >
> >
> > > On Aug 2, 2021, at 11:17 AM, Veesh Goldman <rabbiveesh@gmail.com>
> wrote:
> > >
> > >
> > >
> > >
> > > My point is still that this:
> > >
> > > -----
> > > use v5.36;
> > > print 'Hello, world!';
> > > -----
> > >
> > > ⦠should not be âsubtly wrongâ.
> > >
> > > -F
> > >
> > > Since 5.36 is meant to turn on warnings, this will be explicitly
> wrong, not subtly.
> > >
> > > Perhaps the "wide character" warning is too unclear, but we can always
> improve the text to include a doc link as such.
> >
> > Thereâs no âwide characterâ warning when there happen to be no wide
> characters.
> >
> > >
> > > What compels me more is the following example.
> > > Let's say I'm looking for customers in my database named josé. Easy,
> I'll use DBIC:
> > >
> > > $customer_rs->search({ name => 'josé' })
> > >
> > > But when I run it, I get nothing. That's because the various DBDs will
> handle encoding and decoding for you, bc perl is meant to deal with text in
> userland.
> >
> > Which DBDs?
> >
> > - DBD::SQLite is bytes by default, but it has the SvPV bug (i.e., it
> sends the internal PV to SQLite).
> >
> > - DBD::mysql is also bytes w/ SvPV bug by default.
> >
> > (I havenât tried DBD::Pg.)
> >
> > DBD::mysql has the unicode bug due to long standing issues. DBD::MariaDB
> was forked for this reason.
> >
> > DBD::MariaDB, DBD::SQLite, and DBD::Pg are used with the unicode option
> in any modern programs. Thus they expect decoded strings.
>
> None of these modulesâ documentation says âall new code should enable
> thisâ, so if indeed âany modern programsâ should be set up that way, it
> seems a rather cargo-cult-ish thing.
>
> I would say, respectfully, that you yourself are âmaking a lot of
> assumptions about other peoples' codeâ, etc. etc.
>
> >
> > > Had utf8 been turned on, then I would've started with text, not bytes,
> and found my customers instead of mojibake (though on the other hand, the
> non utf8 is a great way to find double encoded text).
> > >
> > > I think this is a more realistic example than printing a string
> literal, where the behavior is surprising and conceptually inconsistent.
> >
> > Why would you query on a string constant? More likely youâll be
> accepting $name via some input, in which case you have to decode it. But if
> you tried it with a constant you may be confused at why you *didnât* have
> to decode it there.
> >
> > You are making a lot of assumptions about other peoples' code and
> thought processes based on your own experience, which is not the way many
> people approach these problems. And that is why we are considering this; to
> make the defaults match more people's assumptions.
>
> Making defaults match assumptions is a great thing. I just think newcomers
> to the language would make assumptions about what `print 'Hello, world!'`
> does before they reason about DBI etc. Most of those newcomers will hail
> from JS or Python, where this stuff âjust worksâ.
>
> It basically seems like all the right people are on board with the notion
> that âHello, worldâ in âmodernâ Perl will look thus:
>
> -----
> use v5.36;
> use Encode;
> print Encode::encode_utf8('Hello, world!');
> -----
>
> ⦠and any ensuing explanation will have to discuss character encoding, and
> the fact that Perl canât tell text from bytes. Right away this simple
> example draws attention to one of Perlâs more frustration-prone qualities.
>
> Respectfully, I just canât see how this improves the language, and Iâm
> surprised more folks arenât voicing similar thoughts. Iâd love to be wrong;
> I guess weâll see.
>
In core, the official answer is "use utf8" with "binmode":
https://perldoc.perl.org/perluniintro
Outside core Perl, one of the commonly shared guides is
Thread Previous
|
Thread Next