develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
From:
Dan Book
Date:
August 2, 2021 15:54
Subject:
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
CABMkAVWwRzT8F=TUqtpG=9Tf5JTvm0M2OF8DkOYcPAjZNr6AZg@mail.gmail.com
On Mon, Aug 2, 2021 at 11:31 AM Felipe Gasper <felipe@felipegasper.com>
wrote:

>
>
> > On Aug 2, 2021, at 11:17 AM, Veesh Goldman <rabbiveesh@gmail.com> wrote:
> >
> >
> >
> >
> > My point is still that this:
> >
> > -----
> > use v5.36;
> > print 'Hello, world!';
> > -----
> >
> > … should not be “subtly wrong”.
> >
> > -F
> >
> > Since 5.36 is meant to turn on warnings, this will be explicitly wrong,
> not subtly.
> >
> > Perhaps the "wide character" warning is too unclear, but we can always
> improve the text to include a doc link as such.
>
> There’s no “wide character” warning when there happen to be no wide
> characters.
>
> >
> > What compels me more is the following example.
> > Let's say I'm looking for customers in my database named josé. Easy,
> I'll use DBIC:
> >
> > $customer_rs->search({ name => 'josé' })
> >
> > But when I run it, I get nothing. That's because the various DBDs will
> handle encoding and decoding for you, bc perl is meant to deal with text in
> userland.
>
> Which DBDs?
>
> - DBD::SQLite is bytes by default, but it has the SvPV bug (i.e., it sends
> the internal PV to SQLite).
>
> - DBD::mysql is also bytes w/ SvPV bug by default.
>
> (I haven’t tried DBD::Pg.)
>

DBD::mysql has the unicode bug due to long standing issues. DBD::MariaDB
was forked for this reason.

DBD::MariaDB, DBD::SQLite, and DBD::Pg are used with the unicode option in
any modern programs. Thus they expect decoded strings.


> > Had utf8 been turned on, then I would've started with text, not bytes,
> and found my customers instead of mojibake (though on the other hand, the
> non utf8 is a great way to find double encoded text).
> >
> > I think this is a more realistic example than printing a string literal,
> where the behavior is surprising and conceptually inconsistent.
>
> Why would you query on a string constant? More likely you’ll be accepting
> $name via some input, in which case you have to decode it. But if you tried
> it with a constant you may be confused at why you *didn’t* have to decode
> it there.


You are making a lot of assumptions about other peoples' code and thought
processes based on your own experience, which is not the way many
people approach these problems. And that is why we are considering this; to
make the defaults match more people's assumptions.

-Dan

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About