develooper Front page | perl.perl5.porters | Postings from May 2010

[perl #74854] wait and waitpid can return -1 even when there are running child forks.

From:
Charlie Strauss
Date:
May 2, 2010 19:25
Subject:
[perl #74854] wait and waitpid can return -1 even when there are running child forks.
Message ID:
rt-3.6.HEAD-10623-1272827298-964.74854-75-0@perl.org
# New Ticket Created by  Charlie Strauss 
# Please include the string:  [perl #74854]
# in the subject line of all future correspondence about this issue. 
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=74854 >


This is a bug report for perl from cems@lanl.gov,
generated with the help of perlbug 1.39 running under perl v5.8.9.

FROM: charlie strauss, cems@lanl.gov

perlbug:  wait and waitpid can return -1 even when there are running child forks.  This happens any 
time a kill signal is sent to a previously reaped child process ID from inside the parent fork.

tested on platforms:  

1) mac osx 10.5.10  Darwin Kernel Version 9.8.0 perl  v5.8.9 built for darwin-2level

2) Linux 2.6.31-20-generic #58-Ubuntu SMP 

Expected behaviour:

if a parent waits on children then I expect that
1) the wait() command should block till a child exists (or SIGCHLD).
2) the wait() command should never return -1 if there are unreaped or running children.

Observed behaviour:

the above is normally true.  But you can get wait() violate both those conditions very easily.  
This happens any  time a kill signal is sent to a previously reaped child process ID from inside
the parent fork.  

In order for the parent to be able to send a kill signal while it is waiting, this has to happen 
from a signal handler that is triggered externally.

Steps to reproduce:

see enclosed perl script demo.

basic flow is this:

1)  fork off two+ children whose jobs are different lengths: e.g. sleep 1 and sleep 300
2)  let the short running job be our $pid1 and the long running job be our $pid2.
3)  set a %SIG signal handler for any signal like, say sig-QUIT, that sends a kill to our $pid1.
4)  wait() for $pid1 to finish and get reaped by parent.
4)  parent again waits() for $pid2 to finsish using $r=wait();
5)  send the parent a sigQUIT from another terminal or process.
6)  result: wait() unblocks  and you find that  $r is -1 but $pid2 is still running.

WHY THIS IS A PROBLEM:  
most perl man pages suggest a reaper that looks something like this:

do {} while ( waitpid(-1,WNOHANG) > 0 );

Simmilarly waits in parent processes generally look like:

do {} while ( wait() > -1 );
exit;

or

do { sleep 1 } while ( waitpid(-1,WNOHANG) > 0 );
exit;

However this fails due to above bug.  waitpid and wait can return a -1 even when jobs are still 
running.


WORKAROUND:
the work around is to test the wait() value twice in a row when you get a -1.

do { wait() } while  (waitpid(-1,WNOHANG) >  -1 );



this could also fail, perhaps, if multiple kill signals came too quickly for the loop to trap them. 



DEMO SCRIPT

#!/usr/bin/perl -w
use strict; 

my @command1 = ("sleep 360");
my @command2 = ("sleep 2");


my $pid1 = fork();

unless ($pid1) {
    exec @command1;
    };

my $pid2 = fork();

unless ($pid2) {
    exec @command2;
    };

# THIS IS WHAT TRIGGERS THE BUG
$SIG{CONT} = sub{ my $sig = $_[0]; kill $sig, $pid2;   warn " dead process $pid2 was sent $sig\n"};

# NEITHER OF THESE WILL TRIGGER THE BUG.  thus the bug is due to sending a sigCONT to the dead process.
#$SIG{CONT} = sub{ my $sig = $_[0]; kill $sig, $pid1;   warn "live process $pid1 was sent $sig\n"};
#$SIG{CONT} = sub{ my $sig = $_[0];   warn "did nothing with $sig\n"};


warn "processes running:\n";
warn `ps au | grep $pid2 | grep -v grep` ;
warn `ps au | grep $pid1 | grep -v grep` ;
        
wait(); # reap process #2

warn "\n\ncommand @command2 should have been reaped now\n";
warn "\nprocesses running:\n";
my $r = `ps au | grep $pid2 | grep -v grep` ;
warn $r if $r;
warn `ps au | grep $pid1 | grep -v grep` ;

warn "\n\nHUMAN SLAVE: To induce the bug, pull up another terminal and type:
    kill -CONT $$
 then
    kill -CONT $$
 then
    kill -CONT $$
 then
    kill -CONT $$
 then
    kill -QUIT $pid1

(NOTICE the last one was $pid1 and not $$)\n";

 
warn "\n\nParent will first use wait()\n";    
if ( wait() == -1 )  {warn "perl thinks the child process ended, but did it?\n";};

if ( kill 0,$pid1) {
    warn "\nPerl bug detected!!  the process $pid1 is still running!!\n";
    warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
    warn "and you can also see it in the process list as active:";
    warn `ps au | grep $pid1 | grep -v grep`;
    }



warn "\n\nParent will next use waitpid()\n";
if ( waitpid(-1,0) == -1 )  {warn "perl thinks the child process ended, but did it?\n";};


if ( kill 0,$pid1) {
    warn "\nPerl bug detected!!  the process $pid1 is still running!!\n";
    warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
    warn "and you can also see it in the process list as active:";
    warn `ps au | grep $pid1 | grep -v grep`;
    }
    
# here is the only way I know of to work around this bug:

warn "\n\nParent is working around wait() bug by trapping it in a double waitpid wrapper\n";
while ( waitpid(-1,1) >  -1 )  { wait(); warn "perl wait() thinks the child process ended, but we trapped this bug\n";};


if ( kill 0,$pid1) {
    warn "\nPerl bug detected!!  the process $pid1 is still running!!\n";
    warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
    warn "and you can also see it in the process list as active:";
    warn `ps au | grep $pid1 | grep -v grep`;
    }
else {
warn "\n\nThis time the process $pid1 really is ended for real!!!!\n";
warn "you can see this because waitpid is now -1:", waitpid(-1,1),"\n";
warn "and you can also see it is NOT in the process list:";
warn `ps au | grep $pid1 | grep -v grep`;
}



---
Flags:
    category=core
    severity=high
---
Site configuration information for perl v5.8.9:

Configured by cems at Wed Sep 16 16:20:28 MDT 2009.

Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
  Platform:
    osname=darwin, osvers=9.7.0, archname=darwin-2level
    uname='darwin ocho.lanl.gov 9.7.0 darwin kernel version 9.7.0: tue mar 31 22:52:17 pdt 2009; root:xnu-1228.12.14~1release_i386 i386 '
    config_args='-des -D prefix=/opt/local -D scriptdir=/opt/local/bin -D cppflags=-I/opt/local/include -D ldflags=-L/opt/local/lib -D vendorprefix=/opt/local -D man1ext=1pm -D man3ext=3pm -D cc=/usr/bin/gcc-4.0 -D ld=/usr/bin/gcc-4.0 -D man1dir=/opt/local/share/man/man1p -D man3dir=/opt/local/share/man/man3p -D siteman1dir=/opt/local/share/man/man1 -D siteman3dir=/opt/local/share/man/man3 -D vendorman1dir=/opt/local/share/man/man1 -D vendorman3dir=/opt/local/share/man/man3 -D inc_version_list=5.8.8 5.8.8/darwin-2level -U i_bind -U i_gdbm -U i_db'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
    useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='/usr/bin/gcc-4.0', ccflags ='-fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include',
    optimize='-O3',
    cppflags='-I/opt/local/include -no-cpp-precomp -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include'
    ccversion='', gccversion='4.0.1 (Apple Inc. build 5493)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=8, prototype=define
  Linker and Libraries:
    ld='env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0', ldflags ='-L/opt/local/lib -L/usr/local/lib'
    libpth=/usr/local/lib /opt/local/lib /usr/lib
    libs=-ldbm -ldl -lm -lutil -lc
    perllibs=-ldl -lm -lutil -lc
    libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
    cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib'

Locally applied patches:
    

---
@INC for perl v5.8.9:
    /opt/local/lib/perl5/site_perl/5.8.9/darwin-2level
    /opt/local/lib/perl5/site_perl/5.8.9
    /opt/local/lib/perl5/site_perl
    /opt/local/lib/perl5/vendor_perl/5.8.9/darwin-2level
    /opt/local/lib/perl5/vendor_perl/5.8.9
    /opt/local/lib/perl5/vendor_perl
    /opt/local/lib/perl5/5.8.9/darwin-2level
    /opt/local/lib/perl5/5.8.9
    .

---
Environment for perl v5.8.9:
    DYLD_LIBRARY_PATH (unset)
    HOME=/Users/cems
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/Library/Frameworks/Python.framework/Versions/Current/bin:/opt/local/bin:/opt/local/sbin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About