Hi! I recently found out that it is almost impossible to write XS modules that deal with unicode correctly, and here is why: First, the long-known-issue: In XS parameters, the type "char *" is utterly useless, as you have no clue about the encoding of the characters. This even breaks backward compatibility to existing xs modules, who do not expect character values >255. A lot of modules on CPAN have been broken by this incompatible change in 5.6 or so. Now, how about fixing it? Some modules started to use different typemap entries to work around this issue, for example: void LOG (utf8_string msg) T_OCTETS $var = SvPVbyte_nolen ($arg) T_UTF8 // == utf8_string $var = SvPVutf8_nolen ($arg) Unfortunately, unlike other, similar, functions (like SvIV, SvPV etc.), this easily destroys the scalar value: LOG ("see this object:"); LOG ($obj); # $obj no longer an object here, it became a string So unlike other accessor functions such as SvPV, SvPVutf8 changes the contents of the SV in a very visible way (while SvIV doesn't destroy the string, for example). I can understand why it does so, but the problem is, there is simply no good way to deal with utf-8 in XS as the API is extremely hostile at the moment. To get it right, I think one has to do something like this (this can be optimised of course, but that makes it even more complicated): T_UTF8 $var = SvPVutf8_nolen (sv_mortalcopy ($arg)) I think the situation with unicode and cpan perl modules cannot improve as long as it so difficult to do somethign as simple as get at the string data in a non-random/godgiven encoding. Also, even though it is 5.10 now, it should be *seriously* considered to replace the almost completely useless char * typemap entry by something that gives you octets (preferably non-destructively). Or somebody explain to me when "char *" does something useful in current perl versions without tinkering with retesting ST(x) manually... Just my 0.02€. -- The choice of a Deliantra, the free code+content MORPG -----==- _GNU_ http://www.deliantra.net ----==-- _ generation ---==---(_)__ __ ____ __ Marc Lehmann --==---/ / _ \/ // /\ \/ / pcg@goof.com -=====/_/_//_/\_,_/ /_/\_\