alexs @ ecoscentric . com
May 25, 2012 00:45
[perl #113070] threads not joinable issue
This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.10.1.

[Please describe your issue here]

I have an old perl application (pre perl 5.005) which emulated
thread support using fork that I moved over to use threads but
have encountered what I believe to be a major bug.  Basically
the issue can be summed up as:

  foreach my $thr (threads->list(threads::joinable)) {
    $thr->join();    # Linux strace shows hangs here in waitpid4

The thread really believes it is joinable as well:
  foreach my $child (threads->list(threads::joinable)) {
    printf "reap_children: Waiting for thread %d to join\n",$child->tid();
    if ($child->is_joinable()) {
      printf "reap_children: I think %d is joinable\n",$child->tid();
      printf "reap_children: Thread %d joined\n",$child->tid();
    } else {
      printf "reap_children: Thread %d was not joinable\n",$child->tid();

  reap_children: Waiting for thread 3 to join
  reap_children: I think 3 is joinable

    log_debug(1,'reap_children: Waiting for thread '.$child->tid().' to join');
    if ($child->is_joinable()) {
      log_debug(1,'reap_children: Thread '.$child->tid().' joined');
    } else {
      log_debug(0,'reap_children: Child '.$child->tid().' was joinable but now is not');

The trigger to this appears to be a pipe process, a bash script, started by my
main application. I have verified the script has finished ("echo FINISH > /tmp/foo"
as the last line of the script) but my main application is locked in a select()
which includes the pipe.  The main application select() does not return, and
neither does join(), yet both the thread and script have terminated.

The proof of my pudding is that if I kill the bash script, I get:
  reap_children: I think 3 is joinable
  reap_children: Thread 3 joined
and the select() call also returns.  However, when the pipe process is
closed, $? gives -1 and $! returns 'No child processes'.

It seems to me that perl's thread support is getting in a muddle handling
the SIGCHLD signals resulting from the termination of the pipe and the
termination of the thread (though I thought perl 5.10 did not use fork()).
A Linux strace clearly shows perl in waitpid4() waiting on the shell
process ID, which I would have thought it should receive as my script
clearly has terminated (/tmp/foo exists and contains "FINISH" - see above)
I am using POSIX as well, which may be contributing to this confision.

Unfortunately I have not been able to reproduce this issue with a simple
case. My application is a pretty complex automated build and test system
which runs test on remote hardware and logs results in a MySQL database,
and this problem occurs every 4-8 hours after succesfully running
several hundred thousand tests and many builds.  This application is
also well tested and has been in place for almost 10 years.

I have unfortunately had to revert back to my own thread emulation
system where I have a "fork_and_call" function which forks and calls
a given function with the pointer to the function and its arguments
passed to fork_and_call, pushing PIDs on a stack, and a signal handler to
reap SIGCHLD signals and verify when certain "threads" have finished
(ignoring SIGCHLD from terminating pipe processes).

FAOD, I am no NOOB and have been writing lightweighted thread
applications for almost 20 years, and have written over a dozen
multi-threaded perl apps. By design they have all been detached
threads though using threads::shared, Thread::Queue and 
Thread::Semaphore to communicate and handle start/stop synchronisation.
This is however the first time I have attempted to use thread->join()
with disappointing results.

As I have a workaround, I am not in any rush for a fix, especially since
I am unable to provide a small test case.  You may wish to revisit the
thread->join() support though and check for any possibility of what I
have described.  There still could be an issue with my app, but things
to appear to point to SIGCHLD getting misappropriated somewhere.

