develooper Front page | perl.perl5.porters | Postings from November 2003

dealing with objects cloning under ithreads

Stas Bekman
November 14, 2003 01:56
dealing with objects cloning under ithreads
Message ID:
I need to clone an object which contains a reference to a C struct, so it 
won't crash on DESTROY under threads trying to free an already freed by 
another thread C struct. I'm trying to figure out how CLONE is supposed to 
work, but can't find any examples. Looking at the source code:

     /* Call the ->CLONE method, if it exists, for each of the stashes
        identified by sv_dup() above.
     while(av_len(param->stashes) != -1) {
         HV* stash = (HV*) av_shift(param->stashes);
	GV* cloner = gv_fetchmethod_autoload(stash, "CLONE", 0);
	if (cloner && GvCV(cloner)) {
            XPUSHs(sv_2mortal(newSVpv(HvNAME(stash), 0)));
	    call_sv((SV*)GvCV(cloner), G_DISCARD);

it looks for the function CLONE in the current stash and if found calls it 
with that stash name as the only argument. Indeed, this test program when it 
doesn't fail with an error "A thread exited while 2 threads were running.", 
which happens randomly:

package main;
use threads;
sub CLONE {
     my $stash_name = shift;
     my %stash = %{"$stash_name\::"};
     print "CLONE was called,\n keys: ", join " ", keys %stash;
$obj = 1;
threads->new(sub { })->detach;

we get:

% perl-5.8.2-ithread /tmp/clone
CLONE was called,
keys: async  ARGV BEGIN AutoLoader:: @ _<universal.c STDIN vars:: Mac:: 0 
<none>:: obj VMS:: CORE:: ▒ ! ARNING_BITS warnings:: CLONE Internals:: ENV 
stdin / threads:: Scalar:: _<threads.c main:: _<xsutils.c _<perlio.c stderr 
SIG overload::   _<DynaLoader.c INC - Carp:: strict:: DB:: stdout Config:: + ] 
IO:: __ANON__ _<perlmain.c 1 attributes:: PerlIO:: " Regexp:: $ DynaLoader:: _ 

Now let's say I have a class Foo [2] which creates objects with C struct guts 
in it. Then the the user program becomes:

package main;
use threads;
$obj = Foo->new();
threads->new(sub { })->detach;

And here I'm lost. How does the user program know that it needs to call 
Foo::CLONE_OBJ (or whatever the cloning function is) on $obj? You don't expect 
users to write code to traverse the stash, look for the objects belonging to 
class Foo and call $obj->CLONE_OBJ on it. I suppose Foo could export function 
CLONE into the user namespace, so that function will traverse the stash and 
clone its own objects. But what happens if there are two classes used by the 
user program and both need cloning?

perlmod.pod has very little to say about it, but:

   If you want to CLONE all objects you will need to keep track of them per
   package. This is simply done using a hash and Scalar::Util::weaken().

Does it suggest that class Foo need to maintain an internal storage (e.g. 
hash) of all objects it creates, weakened so not to interfere with normal 
DESTROY, and have Foo::CLONE explicitly clone all objects in that storage. 
Like so:

package Foo;
use Scalar::Util;
my %objects = ();
sub new {
   # let's say that new_data() returns a C struct pointer
   my $data = new_data();
   my $self = bless $data, __PACKAGE__;
   my $copy = $self;
   $objects{"$data"} = $copy;
   return $self;

sub CLONE {
   while (my($k, $v) = each %objects) {
     # replace the guts of the object with new $data
     # this affects the object in the user program as well
     $$v = $v->clone_obj;

   my $self = shift;
   # some normal DESTROY code
   # ...
   # cleanup the storage
   my $data = $$self;
   delete $packages{"$data"}; # remove from storage

sub clone_obj {
   my $self = shift;
   my $data = $$self;
   return clone_data($data); # some XS function that does the cloning

does this seem the way to go? this code is untested.

Ideally it'd be nice to just have a CLONE method to be invoked on any cloned 
object. I suppose it wasn't added because of the huge overhead it'll add to an 
already deadly slow [1] perl_clone to check every scalar whether it can('CLONE')?


1) at the moment perl_clone seems to be an almost impractical thing in a real 
world if you have lots of modules loaded. On an unloaded CPU in the mod_perl 
2.0 test suite (About 150 small packages) it takes at least 5 seconds to run a 
single perl_clone. So to start threaded mpm Apache with 12 interpreters will 
take about a minute and longer. If you attempt start 120 interpreters you will 
have to wait at least 10 minutes. Heh.

2) here is an example of a trivial program which will either hang or segfault 
depending on your malloc implementation:

package main;
use threads;
use GTop;
$Foo::gtop = GTop->new;
threads->new(sub { print "thread started\n";})->detach for (1..3);

here all 3 threads get a copy $Foo::gtop, and then try to destroy it (4 
times). Only the first one does it successfully, as gtop returns an object 
which is a wrapper around a C struct. If you don't have GTop, replace it with 
some other class which returns a similar object and you will see what the 
problem is.

Oh, the joy of threads.

Stas Bekman            JAm_pH ------> Just Another mod_perl Hacker     mod_perl Guide ---> Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About