develooper Front page | perl.dbi.dev | Postings from May 2006

Re: Adding utf8 support to DBD::mysql

Thread Previous | Thread Next
From:
Martin J. Evans
Date:
May 4, 2006 01:14
Subject:
Re: Adding utf8 support to DBD::mysql
Message ID:
XFMail.20060504091251.martin.evans@easysoft.com
Tim,

On 04-May-2006 Tim Bunce wrote:
> On Sun, Apr 30, 2006 at 01:36:04PM -0700, Patrick Galbraith wrote:
>> Martin J. Evans wrote:
>> 
>> Martin,
>> 
>> Thanks much! This is dbdimp.c, right? I  will add this tomorrow (not 
>> working today), and test it out.

 
> Please don't use only is_high_bit_set() to enable UTF8.  That'll break
> any code that is storing non-utf8 data that happens to have the high-bit set.
> 
> Please make sure the test cases cover this situation. It's not enough
> to get 'utf8 working' its also important to not break existing code.
> 
> Using the 'charsetnr' value (see below) looks far more correct. That way
> perl will treat the values as UTF8 only if mysql was treating it as UTF8.

Sorry, I should have made it clearer it was only a demonstration that utf8 can
work with mysql as someone had been asking that. I had already told Patrick
that off the list. I fully realised that hack would break 8 bit chrsets.

I have already started looking at charsetnr but have run into a number of
issues due to the way charsetnr has changed over different versions of mysql.

Martin
--
Martin J. Evans
Easysoft Ltd, UK
http://www.easysoft.com


>> >>>The keys mysql docs seem to be
>> >>>http://dev.mysql.com/doc/refman/4.1/en/charset-connection.html
>> >>>
>> >>>The mysql api and client->server protocol doesn't support passing
>> >>>characterset info to the server on a per-statement / per-bind value 
>> >>>basis.
>> >>>(http://dev.mysql.com/doc/refman/4.1/en/c-api-prepared-statement-datatypes
>> >>>.html)
>> >>>So the sane way to send utf8 to the server is by setting the 'connection
>> >>>character set' to utf8 and then only sending utf8 (or its ASCII subset)
>> >>>to the server on that connection.
>> >>>
>> >>>*** Fetching data:
>> >>>
>> >>>MySQL 4.1.0 added "unsigned int charsetnr" to the MYSQL_FIELD structure.
>> >>>It's the "character set number for the field".
>> >>>
>> >>>So set the UTF8 flag based on that value. Something like:
>> >>>   (field->charsetnr = ???) ? SvUTF8_on(sv) : SvUTF8_off(sv);
>> >>>I couldn't see any docs for the values of the charsetnr field.
>> >>>
>> >>>Also, would be good to enable perl code to access the charsetnr values:
>> >>>   $sth->{mysql_charsetnr}->[$i]
>> >>>
>> >>>*** Fetching Metadata:
>> >>>
>> >>>The above is a minimum. It doesn't address metadata like field names
>> >>>($sth->{NAME}) that might also be in utf8. For that the driver needs to
>> >>>know if the 'connection character set' is currently utf8.
>> >>>
>> >>>(The docs mention mysql->charset but it's not clear if that's part of
>> >>>the public API.)
>> >>>
>> >>>However it's detected, the code needs to end up doing:
>> >>>   (...connection charset is utf8...) ? SvUTF8_on(sv) : SvUTF8_off(sv);
>> >>>on the metadata.
>> >>>
>> >>>
>> >>>*** SET NAMES '...'
>> >>>
>> >>>Intercept SET NAMES and call the mysql_set_character_set() API instead.
>> >>>See http://dev.mysql.com/doc/refman/4.1/en/mysql-set-character-set.html
>> >>>
>> >>>
>> >>>*** Detecting Inconsistencies
>> >>>
>> >>>If the connection character set is _not_ utf8 but the application calls
>> >>>the driver with data (or SQL statement) that has the UTF8 flag set, then
>> >>>it could issue a warning. In practice that may be to be too noisy for
>> >>>people that done their own workarounds for utf8 support. If so then
>> >>>they could be changes to level 1 trace messages.
>> >>>
>> >>>If the connection character set _is_ utf8, and the application calls
>> >>>the driver with data (or SQL statement) that does _not_ have the UTF8
>> >>>flag set but _does_ have bytes with the high bit set, then the driver
>> >>>should issue a warning. The checking for high bit set is an extra cost
>> >>>so this should only be enabled if tracing and/or an attribute is set
>> >>>(perhaps called $dbh->{mysql_charset_checks} = 1)
>> >>>
>> >>>Tim.
>> >>>     
>> >>>
>> 

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About