develooper Front page | perl.perl5.porters | Postings from December 2011

weak references and global destruction

Thread Next
Nicholas Clark
December 1, 2011 05:06
weak references and global destruction
Message ID:
Karl hit a problem when refactoring - t/re/regexp_qr_embed_thr.t
started failing with panic: del_backref during global destruction.

After a lot of digging, it turns out that this isn't a regular expression
problem. It's a general implementation problem with weak references.

I can reduce the problem somewhat - cut down versions of t/re/re_tests
and t/re/regexp.t attached. These "work for me" at commit c23e3ba527e40a56
on x86_64 Linux with threads, but I'm not certain how sensitive they are to
a lot of things, possibly down to number of entries in %ENV

The end result is hitting this croak():

Breakpoint 1, Perl_sv_del_backref (my_perl=0x81a010, tsv=0x96d0a0, sv=0xa0ca50)
    at sv.c:5615
5615                Perl_croak(aTHX_ "panic: del_backref");
(gdb) where
#0  Perl_sv_del_backref (my_perl=0x81a010, tsv=0x96d0a0, sv=0xa0ca50)
    at sv.c:5615
#1  0x0000000000510cf4 in Perl_sv_clear (my_perl=0x81a010, orig_sv=0xa0ca50)
    at sv.c:6110
#2  0x0000000000511d7c in Perl_sv_free2 (my_perl=0x81a010, sv=0xa0ca50)
    at sv.c:6445
#3  0x000000000050367b in do_clean_all (my_perl=0x81a010, sv=0xa0ca50)
    at sv.c:610
#4  0x0000000000502c72 in S_visit (my_perl=0x81a010, 
    f=0x5035e4 <do_clean_all>, flags=0, mask=0) at sv.c:420
#5  0x00000000005036af in Perl_sv_clean_all (my_perl=0x81a010) at sv.c:629
#6  0x000000000043e95d in perl_destruct (my_perl=0x81a010) at perl.c:1073
#7  0x000000000041c0a5 in main (argc=4, argv=0x7fffffffe748, 
    env=0x7fffffffe770) at perlmain.c:131

context being:

    if (SvTYPE(*svp) == SVt_PVAV) {
    else {
	/* optimisation: only a single backref, stored directly */
	if (*svp != sv)
	    Perl_croak(aTHX_ "panic: del_backref");
	*svp = NULL;


(gdb) call Perl_sv_dump(my_perl, tsv)
SV = PVHV(0x822e90) at 0x96d0a0
  REFCNT = 3
  ARRAY = 0xa380e0  (0:27, 1:27, 2:9, 4:1)
  hash quality = 108.5%
  KEYS = 49
  FILL = 37
  MAX = 63
  RITER = -1
  EITER = 0x0
  NAME = "Data::Dumper"
  ENAME = "Data::Dumper"
  BACKREFS = 0xa12a40
    SV = UNKNOWN(0xff) (0xa3f668) at 0xa12a40
      REFCNT = 0
      FLAGS = ()
  MRO_WHICH = "dfs" (0x5f38e0)
  CACHE_GEN = 0x2
  PKG_GEN = 0x2e
    SV = UNKNOWN(0xff) (0xa3f6e0) at 0xa73bd8
      REFCNT = 0
      FLAGS = ()
  ISA = 0xa06bf0
    SV = PVHV(0x975d00) at 0xa06bf0
      REFCNT = 1
      ARRAY = 0x941bc0  (0:5, 1:3)
      hash quality = 150.0%
      KEYS = 3
      FILL = 3
      MAX = 7
      RITER = -1
      EITER = 0x0
(gdb) call Perl_sv_dump(my_perl, sv)
SV = PVGV(0xae1b20) at 0xa0ca50
  REFCNT = 0
  NAME = "bootstrap"
  GvSTASH = 0x96d0a0    "Data::Dumper"
  GP = 0x0
(gdb) call Perl_sv_dump(my_perl, *svp)
SV = UNKNOWN(0xff) (0xa3f668) at 0xa12a40
  REFCNT = 0
  FLAGS = ()

The code expects *svp to equal sv. It's not. Hence the croak.

The three values are:

tsv:	the stash %Data::Dumper::
*svp:	its backref pointer
sv:	the glob *Data::Dumper::bootstrap

what's supposed happen is that *svp is supposed to be an AV, and one of the
elements of it is the backpointer to *Data::Dumper::bootstrap

what has actually happened is that that AV has already been freed, before
either the stash or the GV, hence the code above assumes that it's the
"optimised" case [avoid using an AV to store one thing], and the sanity
test fails.

How it happen is a bit more fun. At the start of global destruction,
the AV has 2 references. *Both* are from the stash. The intent is

/* A discussion about the backreferences array and its refcount:
 * The AV holding the backreferences is pointed to either as the mg_obj of
 * PERL_MAGIC_backref, or in the specific case of a HV, from the
 * xhv_backreferences field. The array is created with a refcount
 * of 2. This means that if during global destruction the array gets
 * picked on before its parent to have its refcount decremented by the
 * random zapper, it won't actually be freed, meaning it's still there for
 * when its parent gets freed.

However, %Data::Dumper:: has 3 (legitimate) references. One is from the symbol
table. 2 are from the pads for our variables.

Result - the zapper keeps dropping refcounts by 1 on each iteration, and
the AV gets zapped first. That's not supposed to happen.

Problem is, I think that it's always possible for this to happen as is.
I don't think that bumping the reference count for the backref AV is ever
going to work, as there are various ways that its owner SV can end up with
a higher reference count, and hence outlive it at global destruction, as
this example demonstrates.

I can think of 3 ways to resolve this. The list probably isn't exclusive

0: Eliminate the sanity cross checks at global destruction time.

   It's clear that they can't be relied on, and during the
   "grand closing down sale, everything must go" I don't think that it
   really matters, as everything is going to get freed anyway

1: Switch backreferences to use a private malloc()ed structure, instead of
   an AV.

   This would probably actually save memory, as the use case here doesn't
   require several features of AVs (reference counts, magic, shift off
   the front) which cost about 6 pointers-worth of overhead.
   [The same is true of (at least) pads]
   The downside seems to be that the optimisation of "avoid the array
   completely" is lost

2: Add a new step at the start of global destruction to release all weak

   Everything is getting freed anyway. Code can't rely on references to
   things continuing to exist during global destruction (already), so this
   change doesn't break existing invariants. (Or lack thereof)
   This might actually be an optimisation, as currently global destruction
   cleans up everything "nicely", which means that as it encounters things
   which are weak references, it frees them one by one, and that necessitates
   tidying up the backreference each time. That's an O(n) search for each,
   O(n²) in total for n them. Whereas linearly scanning each backref array
   and setting the weak references to &PL_sv_undef would be O(n)

However, I don't think that (2) is actually a complete solution, because
it strikes me that it's possible for destructors fired during global
destruction to actually create new weak references, which then in turn
get released out of order, such that the sanity cross checks are violated.

Hence I'm tempted to think that (0) and (2) together are the right solution

Background to the reference count bump:
Commit d99b02a1a9949639 (which applies the above patch)
Commit 0565a181ab920bd3, specifically the mg.c hunk:

diff --git a/mg.c b/mg.c
index 6d71b21..1423df0 100644
--- a/mg.c
+++ b/mg.c
@@ -2040,7 +2040,9 @@ Perl_magic_killbackrefs(pTHX_ SV *sv, MAGIC *mg)
     SV **svp = AvARRAY(av);
-    if (svp) {
+    /* Not sure why the av can get freed ahead of its sv, but somehow it does
+       in ext/B/t/bytecode.t test 15 (involving print <DATA>)  */
+    if (svp && !SvIS_FREED(av)) {
 	SV *const *const last = svp + AvFILLp(av);
 	while (svp <= last) {

Nicholas Clark

PS &ithreads::BOOT sets PL_destruct_level to 2 globally. Is this really
   correct for the parent interpreter?
Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About