develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
August 2, 2021 20:29
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
Dan Book <> writes:

> DBD::MariaDB, DBD::SQLite, and DBD::Pg are used with the unicode
> option in any modern programs. Thus they expect decoded strings.

As far as DBD::SQLite is concerned, this is only half-true.  In the
current version 1.70 there have been changes how to declare unicode
handling, but even with DBD_SQLITE_STRING_MODE_UNICODE_STRICT you can
feed it UTF-8 encoded byte sequences and it "just works" (but maybe

You see the downside of this when you have a non-ASCII literal in a
iso-latin-1 encoded Perl source (e.g. "ä" or "\x{e4}").  For Perl, it is
the same character as "\N{LATIN SMALL LETTER A WITH DIAERESIS}", but if
you feed both to the database you get different results.

Veesh could change his source (if in a latin-1 encoded file)
    $customer_rs->search({ name => 'josé' })
    $customer_rs->search({ name => decode('iso-8859-1','josé') })
to make it work.

It seems that the driver still inspects the infamous UTF-8-flag to
decide whether a literal is encoded or not.

This issue goes away when source files are encoded (and assumed to be
encoded in UTF-8.  But "working around driver quirks" is in my opinion
no good motivation for the change.

>  Why would you query on a string constant? More likely you’ll be
>  accepting $name via some input, in which case you have to decode
>  it. But if you tried it with a constant you may be confused at why
>  you *didn’t* have to decode it there.

I've seen that problem when feeding data from iso-latin-1 encoded input.
"You have to decode it" nails it, and you do have to decode it in this
example, too.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About