develooper Front page | perl.dbi.dev | Postings from April 2006

Re: Adding utf8 support to DBD::mysql

Thread Previous | Thread Next
From:
Patrick Galbraith
Date:
April 30, 2006 13:36
Subject:
Re: Adding utf8 support to DBD::mysql
Message ID:
44551FB4.2070008@mysql.com
Martin J. Evans wrote:

Martin,

Thanks much! This is dbdimp.c, right? I  will add this tomorrow (not 
working today), and test it out. If everything is a go, then I'll roll 
up a dev 3.0003_1 releaseAlso, I've mentioned giving you write access to 
SVN for DBD::mysql - do you want that? Also, what is your cpan username?

Kind regards,

Patrick

>Shamelessy stolen from Michael Kröll and against a patch on dbd::mysql
>on subversion I've sent to Patrick which is not committed yet:
>
>@@ -2915,6 +2914,9 @@
>           if (dbis->debug >= 2)
>             PerlIO_printf(DBILOGFP, "st_fetch string data %s\n", fbh->data);
>           sv_setpvn(sv, fbh->data, fbh->length);
>+          if (is_high_bit_set(fbh->data)) {
>+              SvUTF8_on(sv);
>+          }
>           break;
>@@ -2968,6 +2970,10 @@
>           {    --len; }
>         }
>         sv_setpvn(sv, col, len);
>+        if (is_high_bit_set(col)) {
>+            SvUTF8_on(sv);
>+        }
>+        
>
>@@ -3881,3 +3901,11 @@
>   return
>sv_2mortal(my_ulonglong2str(mysql_insert_id(&((imp_dbh_t*)imp_dbh)->mysql)));
> }
> #endif
>+
>+int is_high_bit_set(val)
>+                char *val;
>+{
>+    while (*val)
>+        if (*val++ & 0x80) return 1;
>+    return 0;
>+}
>
>
>would appear to make the previous code (below and without the Encode) I posted
>work fine.
>
>I hasten to add I've not tested this much (other than the posted test code).
>
>What I'm personally interested in is whether retrieving utf data from
>DBD::mysql and pushing it through JSON, JSON->objToJson works and I cannot
>easily test this until Tuesday.
>
>If Patrick can commit my previous changes I'll test this more.
>
>Martin
>--
>Martin J. Evans
>Easysoft Ltd, UK
>http://www.easysoft.com
>
>
>On 30-Apr-2006 Martin J. Evans wrote:
>  
>
>>I hope I'm not muddying the waters but dbd::mysql UTF support for returned
>>data (not metadata) seems to be nearly there. I need UTF8 support on
>>at least inserted and returned results-sets although I'm less bothered
>>by UTF8 table/column names. It would seem that if you define your table as
>>using UTF8 then insertion is not a problem but retrieval is. The
>>following code nearly works - it is just the setting of the utf8 flag on
>>the returned data that is wrong:
>>
>>#!/usr/bin/perl -w
>>use strict;
>>use DBI qw(:utils);
>>use charnames ':full';
>>use Encode;
>>
>>print "Is utf8::is_utf8 defined: ", defined &utf8::is_utf8, "\n";
>>print "Is utf8::valid defined: ", defined &utf8::valid, "\n";
>>my $str = "\x{263a}xxx" . chr(0x05d0) . "\N{ARABIC LETTER ALEF}"; # smiley
>>print join(" ", unpack("H*", $str)), "\n";
>>print "length(str) = ", length($str), "\n";
>>print "bytes::length(str) = ", bytes::length($str), "\n";
>>
>>print "utf8::is_utf8 = ", utf8::is_utf8($str) ? 1 : 0, "\n";
>>print "data_string_desc: ", data_string_desc($str),"\n";
>>
>>open OUT, ">uni.out";
>>binmode(OUT, ":utf8");
>>print OUT "$str\n";
>># data written to uni.out is UTF8
>>
>>my $dbh = DBI->connect("dbi:mysql:test", "xxx", "xxx");
>># there are posts on dbi-user as to whether both or either of
>># the following should be set
>>$dbh->do("set character set utf8");
>>$dbh->do("set names utf8");
>>$dbh->do("drop table if exists utf");
>>$dbh->do("create table utf (a char(100)) default charset utf8");
>>my $sth = $dbh->prepare("insert into utf values (?)");
>>$sth->execute($str);
>>$sth = $dbh->prepare("select * from utf");
>>$sth->execute;
>>
>>my @row = $sth->fetchrow_array;
>>print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\n";
>># the following shows we'e got the right data back
>># but perl does not know it is utf8
>>print join(" ", unpack("H*", $row[0])), "\n";
>># turning on utf8 causes the rignt uf8 sequence to be output
>># and hence sv_utf8_upgrade(sv) will probably work
>>Encode::_utf8_on($row[0]);
>>print "data_string_desc (after fetch): ", data_string_desc($row[0]),"\n";
>>
>>open OUT, ">utf.out";
>>binmode (OUT, ":utf8");
>>print OUT $row[0];
>>close OUT;
>># data written to utf.out is not UTF8 unless is marked utf8
>>
>>produces:
>>
>>Is utf8::is_utf8 defined: 1
>>Is utf8::valid defined: 1
>>e298ba787878d790d8a7
>>length(str) = 6
>>bytes::length(str) = 10
>>utf8::is_utf8 = 1
>>data_string_desc: UTF8 on, non-ASCII, 6 characters 10 bytes
>>data_string_desc (after fetch): UTF8 off, non-ASCII, 10 characters 10 bytes
>>e298ba787878d790d8a7
>>
>>with utf.out containing:
>>
>><C3><A2><C2><98><C2><BA>xxx<C3><97><C2><90><C3><98><C2><A7>
>>
>>without that Encode::_utf8_on($row[0]);
>>
>>Michael Kröll (apr-10-2006)
>>posted a change for DBD::db2 which seemed to sort this out for DB2
>>so a similar change could be added to mysql.
>>
>>Hope this helps.
>>
>>Martin
>>--
>>Martin J. Evans
>>Easysoft Ltd, UK
>>http://www.easysoft.com
>>
>>
>>On 24-Apr-2006 Tim Bunce wrote:
>>    
>>
>>>[I'm at the mysql conference and Patrick asked me about adding utf8
>>>support to DBD::mysql. I said I'd look at the libmysql docs and give my
>>>thoughts. I'm posting to dbi-dev since it may be of interest to others
>>>interested in enhancing DBD::mysql and to other driver developers.
>>>These are just random thoughts from a quick look at the docs.]
>>>
>>>The keys mysql docs seem to be
>>>http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
>>>
>>>The mysql api and client->server protocol doesn't support passing
>>>characterset info to the server on a per-statement / per-bind value basis.
>>>(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes.ht
>>>m
>>>l)
>>>So the sane way to send utf8 to the server is by setting the 'connection
>>>character set' to utf8 and then only sending utf8 (or its ASCII subset)
>>>to the server on that connection.
>>>
>>>*** Fetching data:
>>>
>>>MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD structure.
>>>It's the "character set number for the field".
>>>
>>>So set the UTF8 flag based on that value. Something like:
>>>    (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv);
>>>I couldn't see any docs for the values of the charsetnr field.
>>>
>>>Also, would be good to enable perl code to access the charsetnr values:
>>>    $sth->{mysql_charsetnr}->[$i]
>>>
>>>*** Fetching Metadata:
>>>
>>>The above is a minimum. It doesn't address metadata like field names
>>>($sth->{NAME}) that might also be in utf8. For that the driver needs to
>>>know if the 'connection character set' is currently utf8.
>>>
>>>(The docs mention mysql->charset but it's not clear if that's part of
>>>the public API.)
>>>
>>>However it's detected, the code needs to end up doing:
>>>    (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv);
>>>on the metadata.
>>>
>>>
>>>*** SET NAMES '...'
>>>
>>>Intercept SET NAMES and call the mysql_set_character_set() API instead.
>>>See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html
>>>
>>>
>>>*** Detecting Inconsistencies
>>>
>>>If the connection character set is _not_ utf8 but the application calls
>>>the driver with data (or SQL statement) that has the UTF8 flag set, then
>>>it could issue a warning. In practice that may be to be too noisy for
>>>people that done their own workarounds for utf8 support. If so then
>>>they could be changes to level 1 trace messages.
>>>
>>>If the connection character set _is_ utf8, and the application calls
>>>the driver with data (or SQL statement) that does _not_ have the UTF8
>>>flag set but _does_ have bytes with the high bit set, then the driver
>>>should issue a warning. The checking for high bit set is an extra cost
>>>so this should only be enabled if tracing and/or an attribute is set
>>>(perhaps called $dbh->{mysql_charset_checks} = 1)
>>>
>>>Tim.
>>>      
>>>


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About