Front page | perl.perl5.porters |
Postings from May 2010
[perl #74854] wait and waitpid can return -1 even when there are running child forks.
From:
Charlie Strauss
Date:
May 2, 2010 19:25
Subject:
[perl #74854] wait and waitpid can return -1 even when there are running child forks.
Message ID:
rt-3.6.HEAD-10623-1272827298-964.74854-75-0@perl.org
# New Ticket Created by Charlie Strauss
# Please include the string: [perl #74854]
# in the subject line of all future correspondence about this issue.
# <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=74854 >
This is a bug report for perl from cems@lanl.gov,
generated with the help of perlbug 1.39 running under perl v5.8.9.
FROM: charlie strauss, cems@lanl.gov
perlbug: wait and waitpid can return -1 even when there are running child forks. This happens any
time a kill signal is sent to a previously reaped child process ID from inside the parent fork.
tested on platforms:
1) mac osx 10.5.10 Darwin Kernel Version 9.8.0 perl v5.8.9 built for darwin-2level
2) Linux 2.6.31-20-generic #58-Ubuntu SMP
Expected behaviour:
if a parent waits on children then I expect that
1) the wait() command should block till a child exists (or SIGCHLD).
2) the wait() command should never return -1 if there are unreaped or running children.
Observed behaviour:
the above is normally true. But you can get wait() violate both those conditions very easily.
This happens any time a kill signal is sent to a previously reaped child process ID from inside
the parent fork.
In order for the parent to be able to send a kill signal while it is waiting, this has to happen
from a signal handler that is triggered externally.
Steps to reproduce:
see enclosed perl script demo.
basic flow is this:
1) fork off two+ children whose jobs are different lengths: e.g. sleep 1 and sleep 300
2) let the short running job be our $pid1 and the long running job be our $pid2.
3) set a %SIG signal handler for any signal like, say sig-QUIT, that sends a kill to our $pid1.
4) wait() for $pid1 to finish and get reaped by parent.
4) parent again waits() for $pid2 to finsish using $r=wait();
5) send the parent a sigQUIT from another terminal or process.
6) result: wait() unblocks and you find that $r is -1 but $pid2 is still running.
WHY THIS IS A PROBLEM:
most perl man pages suggest a reaper that looks something like this:
do {} while ( waitpid(-1,WNOHANG) > 0 );
Simmilarly waits in parent processes generally look like:
do {} while ( wait() > -1 );
exit;
or
do { sleep 1 } while ( waitpid(-1,WNOHANG) > 0 );
exit;
However this fails due to above bug. waitpid and wait can return a -1 even when jobs are still
running.
WORKAROUND:
the work around is to test the wait() value twice in a row when you get a -1.
do { wait() } while (waitpid(-1,WNOHANG) > -1 );
this could also fail, perhaps, if multiple kill signals came too quickly for the loop to trap them.
DEMO SCRIPT
#!/usr/bin/perl -w
use strict;
my @command1 = ("sleep 360");
my @command2 = ("sleep 2");
my $pid1 = fork();
unless ($pid1) {
exec @command1;
};
my $pid2 = fork();
unless ($pid2) {
exec @command2;
};
# THIS IS WHAT TRIGGERS THE BUG
$SIG{CONT} = sub{ my $sig = $_[0]; kill $sig, $pid2; warn " dead process $pid2 was sent $sig\n"};
# NEITHER OF THESE WILL TRIGGER THE BUG. thus the bug is due to sending a sigCONT to the dead process.
#$SIG{CONT} = sub{ my $sig = $_[0]; kill $sig, $pid1; warn "live process $pid1 was sent $sig\n"};
#$SIG{CONT} = sub{ my $sig = $_[0]; warn "did nothing with $sig\n"};
warn "processes running:\n";
warn `ps au | grep $pid2 | grep -v grep` ;
warn `ps au | grep $pid1 | grep -v grep` ;
wait(); # reap process #2
warn "\n\ncommand @command2 should have been reaped now\n";
warn "\nprocesses running:\n";
my $r = `ps au | grep $pid2 | grep -v grep` ;
warn $r if $r;
warn `ps au | grep $pid1 | grep -v grep` ;
warn "\n\nHUMAN SLAVE: To induce the bug, pull up another terminal and type:
kill -CONT $$
then
kill -CONT $$
then
kill -CONT $$
then
kill -CONT $$
then
kill -QUIT $pid1
(NOTICE the last one was $pid1 and not $$)\n";
warn "\n\nParent will first use wait()\n";
if ( wait() == -1 ) {warn "perl thinks the child process ended, but did it?\n";};
if ( kill 0,$pid1) {
warn "\nPerl bug detected!! the process $pid1 is still running!!\n";
warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
warn "and you can also see it in the process list as active:";
warn `ps au | grep $pid1 | grep -v grep`;
}
warn "\n\nParent will next use waitpid()\n";
if ( waitpid(-1,0) == -1 ) {warn "perl thinks the child process ended, but did it?\n";};
if ( kill 0,$pid1) {
warn "\nPerl bug detected!! the process $pid1 is still running!!\n";
warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
warn "and you can also see it in the process list as active:";
warn `ps au | grep $pid1 | grep -v grep`;
}
# here is the only way I know of to work around this bug:
warn "\n\nParent is working around wait() bug by trapping it in a double waitpid wrapper\n";
while ( waitpid(-1,1) > -1 ) { wait(); warn "perl wait() thinks the child process ended, but we trapped this bug\n";};
if ( kill 0,$pid1) {
warn "\nPerl bug detected!! the process $pid1 is still running!!\n";
warn "you can see this because waitpid is now zero:", waitpid(-1,1),"\n";
warn "and you can also see it in the process list as active:";
warn `ps au | grep $pid1 | grep -v grep`;
}
else {
warn "\n\nThis time the process $pid1 really is ended for real!!!!\n";
warn "you can see this because waitpid is now -1:", waitpid(-1,1),"\n";
warn "and you can also see it is NOT in the process list:";
warn `ps au | grep $pid1 | grep -v grep`;
}
---
Flags:
category=core
severity=high
---
Site configuration information for perl v5.8.9:
Configured by cems at Wed Sep 16 16:20:28 MDT 2009.
Summary of my perl5 (revision 5 version 8 subversion 9) configuration:
Platform:
osname=darwin, osvers=9.7.0, archname=darwin-2level
uname='darwin ocho.lanl.gov 9.7.0 darwin kernel version 9.7.0: tue mar 31 22:52:17 pdt 2009; root:xnu-1228.12.14~1release_i386 i386 '
config_args='-des -D prefix=/opt/local -D scriptdir=/opt/local/bin -D cppflags=-I/opt/local/include -D ldflags=-L/opt/local/lib -D vendorprefix=/opt/local -D man1ext=1pm -D man3ext=3pm -D cc=/usr/bin/gcc-4.0 -D ld=/usr/bin/gcc-4.0 -D man1dir=/opt/local/share/man/man1p -D man3dir=/opt/local/share/man/man3p -D siteman1dir=/opt/local/share/man/man1 -D siteman3dir=/opt/local/share/man/man3 -D vendorman1dir=/opt/local/share/man/man1 -D vendorman3dir=/opt/local/share/man/man3 -D inc_version_list=5.8.8 5.8.8/darwin-2level -U i_bind -U i_gdbm -U i_db'
hint=recommended, useposix=true, d_sigaction=define
usethreads=undef use5005threads=undef useithreads=undef usemultiplicity=undef
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='/usr/bin/gcc-4.0', ccflags ='-fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include',
optimize='-O3',
cppflags='-I/opt/local/include -no-cpp-precomp -fno-common -DPERL_DARWIN -I/opt/local/include -no-cpp-precomp -fno-strict-aliasing -pipe -I/usr/local/include -I/opt/local/include'
ccversion='', gccversion='4.0.1 (Apple Inc. build 5493)', gccosandvers=''
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 /usr/bin/gcc-4.0', ldflags ='-L/opt/local/lib -L/usr/local/lib'
libpth=/usr/local/lib /opt/local/lib /usr/lib
libs=-ldbm -ldl -lm -lutil -lc
perllibs=-ldl -lm -lutil -lc
libc=/usr/lib/libc.dylib, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags='-L/opt/local/lib -bundle -undefined dynamic_lookup -L/usr/local/lib'
Locally applied patches:
---
@INC for perl v5.8.9:
/opt/local/lib/perl5/site_perl/5.8.9/darwin-2level
/opt/local/lib/perl5/site_perl/5.8.9
/opt/local/lib/perl5/site_perl
/opt/local/lib/perl5/vendor_perl/5.8.9/darwin-2level
/opt/local/lib/perl5/vendor_perl/5.8.9
/opt/local/lib/perl5/vendor_perl
/opt/local/lib/perl5/5.8.9/darwin-2level
/opt/local/lib/perl5/5.8.9
.
---
Environment for perl v5.8.9:
DYLD_LIBRARY_PATH (unset)
HOME=/Users/cems
LANG (unset)
LANGUAGE (unset)
LD_LIBRARY_PATH (unset)
LOGDIR (unset)
PATH=/Library/Frameworks/Python.framework/Versions/Current/bin:/opt/local/bin:/opt/local/sbin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/Library/Frameworks/Python.framework/Versions/Current/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/X11/bin
PERL_BADLANG (unset)
SHELL=/bin/bash
-
[perl #74854] wait and waitpid can return -1 even when there are running child forks.
by Charlie Strauss