develooper Front page | perl.perl5.porters | Postings from May 2012

[perl #113070] threads not joinable issue

Thread Next
From:
alexs @ ecoscentric . com
Date:
May 25, 2012 00:45
Subject:
[perl #113070] threads not joinable issue
Message ID:
rt-3.6.HEAD-7788-1337861546-1771.113070-75-0@perl.org
# New Ticket Created by  alexs@ecoscentric.com 
# Please include the string:  [perl #113070]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=113070 >



This is a bug report for perl from alexs@ecoscentric.com,
generated with the help of perlbug 1.39 running under perl 5.10.1.


-----------------------------------------------------------------
[Please describe your issue here]

I have an old perl application (pre perl 5.005) which emulated
thread support using fork that I moved over to use threads but
have encountered what I believe to be a major bug.  Basically
the issue can be summed up as:

  foreach my $thr (threads->list(threads::joinable)) {
    $thr->join();    # Linux strace shows hangs here in waitpid4
  }

The thread really believes it is joinable as well:
  foreach my $child (threads->list(threads::joinable)) {
    printf "reap_children: Waiting for thread %d to join\n",$child->tid();
    if ($child->is_joinable()) {
      printf "reap_children: I think %d is joinable\n",$child->tid();
      $child->join();
      printf "reap_children: Thread %d joined\n",$child->tid();
    } else {
      printf "reap_children: Thread %d was not joinable\n",$child->tid();
    }
  }

Gives:
  ...
  reap_children: Waiting for thread 3 to join
  reap_children: I think 3 is joinable
  [hangs]

    log_debug(1,'reap_children: Waiting for thread '.$child->tid().' to join');
    if ($child->is_joinable()) {
      $child->join();
      log_debug(1,'reap_children: Thread '.$child->tid().' joined');
    } else {
      log_debug(0,'reap_children: Child '.$child->tid().' was joinable but now is not');
    }
  }

The trigger to this appears to be a pipe process, a bash script, started by my
main application. I have verified the script has finished ("echo FINISH > /tmp/foo"
as the last line of the script) but my main application is locked in a select()
which includes the pipe.  The main application select() does not return, and
neither does join(), yet both the thread and script have terminated.

The proof of my pudding is that if I kill the bash script, I get:
  ...
  reap_children: I think 3 is joinable
  reap_children: Thread 3 joined
and the select() call also returns.  However, when the pipe process is
closed, $? gives -1 and $! returns 'No child processes'.

It seems to me that perl's thread support is getting in a muddle handling
the SIGCHLD signals resulting from the termination of the pipe and the
termination of the thread (though I thought perl 5.10 did not use fork()).
A Linux strace clearly shows perl in waitpid4() waiting on the shell
process ID, which I would have thought it should receive as my script
clearly has terminated (/tmp/foo exists and contains "FINISH" - see above)
I am using POSIX as well, which may be contributing to this confision.

Unfortunately I have not been able to reproduce this issue with a simple
case. My application is a pretty complex automated build and test system
which runs test on remote hardware and logs results in a MySQL database,
and this problem occurs every 4-8 hours after succesfully running
several hundred thousand tests and many builds.  This application is
also well tested and has been in place for almost 10 years.

I have unfortunately had to revert back to my own thread emulation
system where I have a "fork_and_call" function which forks and calls
a given function with the pointer to the function and its arguments
passed to fork_and_call, pushing PIDs on a stack, and a signal handler to
reap SIGCHLD signals and verify when certain "threads" have finished
(ignoring SIGCHLD from terminating pipe processes).

FAOD, I am no NOOB and have been writing lightweighted thread
applications for almost 20 years, and have written over a dozen
multi-threaded perl apps. By design they have all been detached
threads though using threads::shared, Thread::Queue and 
Thread::Semaphore to communicate and handle start/stop synchronisation.
This is however the first time I have attempted to use thread->join()
with disappointing results.

As I have a workaround, I am not in any rush for a fix, especially since
I am unable to provide a small test case.  You may wish to revisit the
thread->join() support though and check for any possibility of what I
have described.  There still could be an issue with my app, but things
to appear to point to SIGCHLD getting misappropriated somewhere.

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=library
    severity=high
    module=threads
---
This perlbug was built using Perl 5.10.1 in the Fedora build system.
It is being executed now by Perl 5.10.1 - Sun Nov  6 00:37:43 GMT 2011.

Site configuration information for perl 5.10.1:

Configured by Red Hat, Inc. at Sun Nov  6 00:37:43 GMT 2011.

Summary of my perl5 (revision 5 version 10 subversion 1) configuration:
   
  Platform:
    osname=linux, osvers=2.6.32-44.2.el6.x86_64, archname=x86_64-linux-thread-multi
    uname='linux c6b5.bsys.dev.centos.org 2.6.32-44.2.el6.x86_64 #1 smp wed jul 21 12:48:32 edt 2010 x86_64 x86_64 x86_64 gnulinux '
    config_args='-des -Doptimize=-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic -DDEBUGGING=-g -Dversion=5.10.1 -Dmyhostname=localhost -Dperladmin=root@localhost -Dcc=gcc -Dcf_by=Red Hat, Inc. -Dprefix=/usr -Dvendorprefix=/usr -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl5 -Dsitearch=/usr/local/lib64/perl5 -Dprivlib=/usr/share/perl5 -Darchlib=/usr/lib64/perl5 -Dvendorlib=/usr/share/perl5/vendor_perl -Dvendorarch=/usr/lib64/perl5/vendor_perl -Dinc_version_list=5.10.0 -Darchname=x86_64-linux-thread-multi -Dlibpth=/usr/local/lib64 /lib64 /usr/lib64 -Duseshrplib -Dusethreads -Duseithreads -Duselargefiles -Dd_dosuid -Dd_semctl_semun -Di_db -Ui_ndbm -Di_gdbm -Di_shadow -Di_syslog -Dman3ext=3pm -Duseperlio -Dinstallusrbinperl=n -Ubincompat5005 -Uversiononly -Dpager=/usr/bin/less -isr -Dd_gethostent_r_proto -Ud_endhostent_r_proto -Ud_sethostent_r_proto -Ud_endprotoent_r_proto -Ud_setprotoent_r_proto -U
 d_endservent_r_proto -Ud_setservent_r_proto -Dscriptdir=/usr/bin'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=define, usemultiplicity=define
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=define, use64bitall=define, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='gcc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.4.5 20110214 (Red Hat 4.4.5-6)', gccosandvers=''
    intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='gcc', ldflags =' -fstack-protector'
    libpth=/usr/local/lib64 /lib64 /usr/lib64
    libs=-lresolv -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lpthread -lc
    perllibs=-lresolv -lnsl -ldl -lm -lcrypt -lutil -lpthread -lc
    libc=, so=so, useshrplib=true, libperl=libperl.so
    gnulibc_version='2.12'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-Wl,-E -Wl,-rpath,/usr/lib64/perl5/CORE'
    cccdlflags='-fPIC', lddlflags='-shared -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic'

Locally applied patches:
    

---
@INC for perl 5.10.1:
    /usr/local/lib64/perl5
    /usr/local/share/perl5
    /usr/lib64/perl5/vendor_perl
    /usr/share/perl5/vendor_perl
    /usr/lib64/perl5
    /usr/share/perl5
    .

---
Environment for perl 5.10.1:
    HOME=/home/farm
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/farm/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About