develooper Front page | perl.perl5.porters | Postings from May 2010

Re: [BUG] Safe signals changes causing hangs with threads (was RE: [perl.git] branch blead, updated. v5.12.0-80-gda76b85)

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
May 16, 2010 13:16
Subject:
Re: [BUG] Safe signals changes causing hangs with threads (was RE: [perl.git] branch blead, updated. v5.12.0-80-gda76b85)
Message ID:
20100516201609.GB2576@plum.flirble.org
On Wed, Apr 21, 2010 at 09:28:48PM +0100, Dave Mitchell wrote:
> On Wed, Apr 21, 2010 at 01:14:50PM -0400, Jerry D. Hedden wrote:
> > Attached is a self-contained script that reproduces the problem.
> 
> You're getting deadlock due to the lock() in the signal handler.
> 
> In more detail:
> 
> There are two locks involved: PL_sharedsv_lock, and the user lock
> associated with %SUSPEND.
> 
> The main thread calls threads::suspend, acquires the user lock on
> %SUSPEND, then messes with %SUSPEND, which causes shared.xs to try to
> acquire PL_sharedsv_lock on its behalf.
> 
> Conversely one of the worker threads calls delete on %CHECKER, which:
>     calls sharedsv_elem_mg_DELETE, and locks PL_sharedsv_lock;
>     calls sharedsv_elem_mg_FETCH which increments the lock count on
> 	    PL_sharedsv_lock;
> 	at the end of FETCH, it does a LEAVE, which:
> 	    decrements the count of PL_sharedsv_lock by one (but still
> 		locked because of DELETE),
> 	    as a final action, does PERL_ASYNC_CHECK, which notices
> 	    a pending signal, and calls the userland signal handler, which
> 	    tries to grab the user lock on %SUSPEND
> 
> 
> So you have two threads locking in reverse order:
> 
>     main: lock(%SUSPEND); lock(PL_sharedsv_lock)
>     worker: lock(PL_sharedsv_lock); lock(%SUSPEND)
> 
> and hence potential deadlock.
> 
> This is new because of the new PERL_ASYNC_CHECK in pop_scope(), which
> allows signal handlers to be called while PL_sharedsv_lock is still held.

Thanks for the analysis. I've been thinking about this. I keep wondering,
should PERL_ASYNC_CHECK (or at least, the thing that it calls) refuse to
dispatch signals if PL_sharedsv_lock is held? The justification being that
the interpreter isn't in a "safe" state if it's in the middle of doing
something that has locked shared space.

In that, yes, we have (currently) a change of behaviour that introduces a
deadlock. But if we remove this particular cause, we could still deadlock
if signals are dispatched via any subordinate runloop invoked while
PL_sharedsv_lock is held (in this case during the FETCH). Such as DESTROY
triggering or overloading on some value being read.

The downside to this would seem to be the potential for making Perl activity
uninterruptible whilst PL_sharedsv_lock is held. But is that a bad thing?
Currently, for cases where it's held, if an exception is thrown, does it
release the lock correctly as part of the scope unwinding?

Am I thinking too hard about edge cases on edge cases? And getting divorced
from the actual code?

Nicholas Clark



Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About