On Mon, Aug 2, 2021 at 5:32 PM Dan Book <grinnz@gmail.com> wrote: > On Mon, Aug 2, 2021 at 4:28 PM Harald Jörg <haj@posteo.de> wrote: > >> Dan Book <grinnz@gmail.com> writes: >> >> > DBD::MariaDB, DBD::SQLite, and DBD::Pg are used with the unicode >> > option in any modern programs. Thus they expect decoded strings. >> >> As far as DBD::SQLite is concerned, this is only half-true. In the >> current version 1.70 there have been changes how to declare unicode >> handling, but even with DBD_SQLITE_STRING_MODE_UNICODE_STRICT you can >> feed it UTF-8 encoded byte sequences and it "just works" (but maybe >> shouldn't). >> >> You see the downside of this when you have a non-ASCII literal in a >> iso-latin-1 encoded Perl source (e.g. "ä" or "\x{e4}"). For Perl, it is >> the same character as "\N{LATIN SMALL LETTER A WITH DIAERESIS}", but if >> you feed both to the database you get different results. >> > > I don't think this is correct. Mojo::SQLite has many tests to ensure in > unicode-mode that it treats strings consistently. > > >> Veesh could change his source (if in a latin-1 encoded file) >> $customer_rs->search({ name => 'josé' }) >> to >> $customer_rs->search({ name => decode('iso-8859-1','josé') }) >> to make it work. >> > > This code makes no difference, decoding from iso-8859-1 is a no-op in Perl > strings (aside from considering "bytes" outside the single-byte encoding > range as errors/replacement characters). > > >> It seems that the driver still inspects the infamous UTF-8-flag to >> decide whether a literal is encoded or not. >> > > This is not the case. > > use strict; > use warnings; > use DBD::SQLite; > use DBD::SQLite::Constants ':dbd_sqlite_string_mode'; > > my %options = (RaiseError => 1, AutoInactiveDestroy => 1, > sqlite_string_mode => DBD_SQLITE_STRING_MODE_UNICODE_FALLBACK); > my $db = DBI->connect('dbi:SQLite:dbname=:memory:', undef, undef, > \%options); > > my $str = "\xe4"; > > utf8::downgrade $str; > printf "%vX (length: %d)\n", $db->selectrow_array('SELECT ?, length(?)', > undef, $str, $str); > # prints: E4 (length: 1) > > utf8::upgrade $str; > printf "%vX (length: %d)\n", $db->selectrow_array('SELECT ?, length(?)', > undef, $str, $str); > # prints: E4 (length: 1) > And for completeness if you do the same test with the UTF-8 encoded bytes "\xc3\xa4" you get consistent results as well: C3.A4 (length: 2) -DanThread Previous | Thread Next