develooper Front page | perl.perl5.porters | Postings from August 2021

Re: "use v5.36.0" should imply UTF-8 encoded source

Thread Previous | Thread Next
Dan Book
August 2, 2021 21:32
Re: "use v5.36.0" should imply UTF-8 encoded source
Message ID:
On Mon, Aug 2, 2021 at 4:28 PM Harald Jörg <> wrote:

> Dan Book <> writes:
> > DBD::MariaDB, DBD::SQLite, and DBD::Pg are used with the unicode
> > option in any modern programs. Thus they expect decoded strings.
> As far as DBD::SQLite is concerned, this is only half-true.  In the
> current version 1.70 there have been changes how to declare unicode
> handling, but even with DBD_SQLITE_STRING_MODE_UNICODE_STRICT you can
> feed it UTF-8 encoded byte sequences and it "just works" (but maybe
> shouldn't).
> You see the downside of this when you have a non-ASCII literal in a
> iso-latin-1 encoded Perl source (e.g. "ä" or "\x{e4}").  For Perl, it is
> the same character as "\N{LATIN SMALL LETTER A WITH DIAERESIS}", but if
> you feed both to the database you get different results.

I don't think this is correct. Mojo::SQLite has many tests to ensure in
unicode-mode that it treats strings consistently.

> Veesh could change his source (if in a latin-1 encoded file)
>     $customer_rs->search({ name => 'josé' })
> to
>     $customer_rs->search({ name => decode('iso-8859-1','josé') })
> to make it work.

This code makes no difference, decoding from iso-8859-1 is a no-op in Perl
strings (aside from considering "bytes" outside the single-byte encoding
range as errors/replacement characters).

> It seems that the driver still inspects the infamous UTF-8-flag to
> decide whether a literal is encoded or not.

This is not the case.

use strict;
use warnings;
use DBD::SQLite;
use DBD::SQLite::Constants ':dbd_sqlite_string_mode';

my %options = (RaiseError => 1, AutoInactiveDestroy => 1,
my $db = DBI->connect('dbi:SQLite:dbname=:memory:', undef, undef,

my $str = "\xe4";

utf8::downgrade $str;
printf "%vX (length: %d)\n", $db->selectrow_array('SELECT ?, length(?)',
undef, $str, $str);
# prints: E4 (length: 1)

utf8::upgrade $str;
printf "%vX (length: %d)\n", $db->selectrow_array('SELECT ?, length(?)',
undef, $str, $str);
# prints: E4 (length: 1)

> This issue goes away when source files are encoded (and assumed to be
> encoded in UTF-8.  But "working around driver quirks" is in my opinion
> no good motivation for the change.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About