develooper Front page | perl.perl5.porters | Postings from April 2008

Re: on the almost impossibility to write correct XS modules

Marc Lehmann
April 26, 2008 21:00
Re: on the almost impossibility to write correct XS modules
Message ID:
On Sat, Apr 26, 2008 at 05:33:02PM +0200, Rafael Garcia-Suarez <> wrote:
> >    LOG ("see this object:");
> >    LOG ($obj);
> >    # $obj no longer an object here, it became a string
> You mean, like "Class=HASH(0xDEADBEEF)", I suppose (I haven't checked).

Yes, for example references.

> >  I think the situation with unicode and cpan perl modules cannot improve
> >  as long as it so difficult to do somethign as simple as get at the string
> >  data in a non-random/godgiven encoding.
> That's right, and that's probably also why people find it difficult to
> handle utf-8 in perl as soon as they begin using XS modules.

Yes, it becomes unpredictable. It is also not very helpful when one has to
fight for every bugfix regarding unicode in perl (the bugfix could break
existing code), and when everybody mentally uses and propagates a slightly
different unicode model.

The biggest problem with perl and unicode is that it isn't consistent, and
little is done to make it consistent.

And little can be done from the side of XS to make this work.

Now, to make this mail a bit more helpful, which flags would I need to check
to achieve the following:

  1. if the SV can safely be SvPVutf'd, just do it
  2. if not, SvPVutf8'd a mortalcopy

i.e., how could I check that calling SvPVutf8 (or bytes) on a scalar is
"safe" in the sense of not modifying it w.r.t. to perl visibility?

> I think that this screams for a new macro which would be more or less
> the one you suggested here, maybe implemented in a more efficient way if
> possible.

Well, I think that would be a very bad way to solve the problem.

For example, in the reference case, the stringified non-utf-8 reference is
still valid utf-8, so we don't have to create an sv with an utf-8 string.

(if backwards-compatibility is a problem, name the macro differently, but
both solutions would require that).

> You mean this one ?
>     T_PV
> 	    $var = ($type)SvPV_nolen($arg)
> Here, the result would be dependent on the internal representation of
> the string in perl, so I suppose you would like to change this to
> something that uses SvPVbyte. I wonder, however, how much code would
> break with that change. Actually I suspect that more code would be fixed
> than breaked, but that's a wild intuition...

Thats my point: all modules that haven't been updated would be fixed.

The problem that people out there have with unicode in perl is that it
doesn't work with their favourite module. When it doesn't work, they start to
google and find stuff about this utf-8 flag that they then want to
manipulate, thinking unicode==utf-8 flag, and this doesn't work too well
either, so they get frustrated etc.

I keep giving talks about perls unicode model, but it is hard to explain it
without also saying that it doesn't work in practise, and everything is
immensely complicate din practise, and....

Now, as for changing T_PV:

what could be done trivially (I'd happily send a patch) would be to provide

   typedef char char_bytes;
   typedef char char_utf8;

   char_utf8	T_PVutf8
   char_bytes	T_PVbytes

*preferably* with the mortalcopy trick or a more efficient/perlish
solution to the SvPVutf8 problem.

Changing T_PV itself is something that should really be considered,
though.  Code that breaks almost certainly exists, but isn't it somewhat
quetsionable anyways to use "char *" in perl and then get access to the
flag in other ways (e.g. by counting agruments and using ST(n))?

the point is, code that uses char * without looking at the flag in other
ways is simply broken, it has no defined/documentable behaviour on the
perl level.

                The choice of a       Deliantra, the free code+content MORPG
      -----==-     _GNU_    
      ----==-- _       generation
      ---==---(_)__  __ ____  __      Marc Lehmann
      --==---/ / _ \/ // /\ \/ /
      -=====/_/_//_/\_,_/ /_/\_\ Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About