develooper Front page | perl.perl5.porters | Postings from April 2008

Re: on the almost impossibility to write correct XS modules

From:
Rafael Garcia-Suarez
Date:
April 26, 2008 08:33
Subject:
Re: on the almost impossibility to write correct XS modules
Message ID:
b77c1dce0804260833o7a1c28bg64f3f4f58fc1cf9d@mail.gmail.com
2008/4/26 Marc Lehmann <schmorp@schmorp.de>:
>  Some modules started to use different typemap entries to work around this
>  issue, for example:
>
>    void LOG (utf8_string msg)
>
>    T_OCTETS
>            $var = SvPVbyte_nolen ($arg)
>
>    T_UTF8 // == utf8_string
>            $var = SvPVutf8_nolen ($arg)
>
>  Unfortunately, unlike other, similar, functions (like SvIV, SvPV etc.), this
>  easily destroys the scalar value:
>
>    LOG ("see this object:");
>    LOG ($obj);
>    # $obj no longer an object here, it became a string

You mean, like "Class=HASH(0xDEADBEEF)", I suppose (I haven't checked).

>  So unlike other accessor functions such as SvPV, SvPVutf8 changes the
>  contents of the SV in a very visible way (while SvIV doesn't destroy the
>  string, for example).
>
>  I can understand why it does so, but the problem is, there is simply no good
>  way to deal with utf-8 in XS as the API is extremely hostile at the moment.
>
>  To get it right, I think one has to do something like this (this can be
>  optimised of course, but that makes it even more complicated):
>
>    T_UTF8
>            $var = SvPVutf8_nolen (sv_mortalcopy ($arg))
>
>  I think the situation with unicode and cpan perl modules cannot improve
>  as long as it so difficult to do somethign as simple as get at the string
>  data in a non-random/godgiven encoding.

That's right, and that's probably also why people find it difficult to
handle utf-8 in perl as soon as they begin using XS modules.

I think that this screams for a new macro which would be more or less
the one you suggested here, maybe implemented in a more efficient way if
possible.

>  Also, even though it is 5.10 now, it should be *seriously* considered to
>  replace the almost completely useless char * typemap entry by something
>  that gives you octets (preferably non-destructively). Or somebody explain
>  to me when "char *" does something useful in current perl versions without
>  tinkering with retesting ST(x) manually...

You mean this one ?
    T_PV
	    $var = ($type)SvPV_nolen($arg)
Here, the result would be dependent on the internal representation of
the string in perl, so I suppose you would like to change this to
something that uses SvPVbyte. I wonder, however, how much code would
break with that change. Actually I suspect that more code would be fixed
than breaked, but that's a wild intuition...



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About