2008/4/26 Marc Lehmann <schmorp@schmorp.de>: > Some modules started to use different typemap entries to work around this > issue, for example: > > void LOG (utf8_string msg) > > T_OCTETS > $var = SvPVbyte_nolen ($arg) > > T_UTF8 // == utf8_string > $var = SvPVutf8_nolen ($arg) > > Unfortunately, unlike other, similar, functions (like SvIV, SvPV etc.), this > easily destroys the scalar value: > > LOG ("see this object:"); > LOG ($obj); > # $obj no longer an object here, it became a string You mean, like "Class=HASH(0xDEADBEEF)", I suppose (I haven't checked). > So unlike other accessor functions such as SvPV, SvPVutf8 changes the > contents of the SV in a very visible way (while SvIV doesn't destroy the > string, for example). > > I can understand why it does so, but the problem is, there is simply no good > way to deal with utf-8 in XS as the API is extremely hostile at the moment. > > To get it right, I think one has to do something like this (this can be > optimised of course, but that makes it even more complicated): > > T_UTF8 > $var = SvPVutf8_nolen (sv_mortalcopy ($arg)) > > I think the situation with unicode and cpan perl modules cannot improve > as long as it so difficult to do somethign as simple as get at the string > data in a non-random/godgiven encoding. That's right, and that's probably also why people find it difficult to handle utf-8 in perl as soon as they begin using XS modules. I think that this screams for a new macro which would be more or less the one you suggested here, maybe implemented in a more efficient way if possible. > Also, even though it is 5.10 now, it should be *seriously* considered to > replace the almost completely useless char * typemap entry by something > that gives you octets (preferably non-destructively). Or somebody explain > to me when "char *" does something useful in current perl versions without > tinkering with retesting ST(x) manually... You mean this one ? T_PV $var = ($type)SvPV_nolen($arg) Here, the result would be dependent on the internal representation of the string in perl, so I suppose you would like to change this to something that uses SvPVbyte. I wonder, however, how much code would break with that change. Actually I suspect that more code would be fixed than breaked, but that's a wild intuition...