develooper Front page | perl.perl5.porters | Postings from June 2001

[ID 20010629.002] perl segfaults unpredictable with valid code(Cookbook:p570,571 cmd3sel) concerns presumably a race condition between'waitpid' and 'open' and signalhandling

From:
Wengatz Herbert
Date:
June 29, 2001 08:39
Subject:
[ID 20010629.002] perl segfaults unpredictable with valid code(Cookbook:p570,571 cmd3sel) concerns presumably a race condition between'waitpid' and 'open' and signalhandling
Message ID:
3B3CA108.4ABAE99F@mchr2.siemens.de
Hi there!

We have here a rather nasty problem which we could reproduce with different
perl-versions(5.6.1 and 5.00503) and on different unix-like and unix 
operating systems (HP-UX 11, Solaris 2.6 (worst), SunOS 4.1.3 and Linux 
with Kernel 2.2.14).

We are developing some software for use as administration tools for huge
unix networks, so we have a big interest in our programs to run stable 
and predictable (the faintest buglet may end in up in expensive havoc).

We want to create a routine that is able to handle external commands in
a safe way and we want to communicate with it. Thus we took as a basis
the example from the Perl Cookbook (ORA), chapter 16.9, on pages 570 and
571 (cmd3sel).

We extended the example a little bit and at first we noticed a strange
behaviour because sometimes it wrote back the informations of the died
child (which it should) and sometimes it didn't. - Since we have to rely 
on what we receive there, we investigated some more and ended up with a 
example-script which works mostly the way it should, but sometimes everything 
breaks with segmentation violations and even some other unexpected error 
messages from inside perl (see below or try on your own).

We also found out that the errors occur more often when the system load is
higher.

Here is our code (we tried to reduce it as much as we could, and you may
quite well recognize the code from cmd3sel):

----------->8--- cut here ----8<--------
#!/usr/local/bin/perl -T -w

use IO::Select;
use IPC::Open3;

delete @ENV{qw{IFS CDPATH ENV BASH_ENV PATH}};

# repeat 500 times to really show the effect

for($i = 0 ; $i < 500 ; $i++)
{       @io_channel = ();

	# since we called this script 'bug', the line below
	# will produce output on both, STDOUT and STDERR ('xxx' doesn't
	# exist).

        &system_redirect(\@io_channel,"/bin/ls -l bug xxx");

        print "STDOUT was: ",$io_channel[1],"\n";
        print "STDERR was: ",$io_channel[2],"\n";
}

###############################################################################
# system_redirect()
###############################################################################
sub system_redirect()
{       my($ra_io_channel,@cmd) = @_;

        local $exitstatus = '?';

        my $pid = open3(*CMD_IN,*CMD_OUT,*CMD_ERR,@cmd);

        $SIG{CHLD} = sub
        {       if(waitpid($pid,0) > 0)
                {       printf("exitstatus of child: %d\n.",$?);
                }
                $exitstatus = $?;
        };

        if(defined $ra_io_channel->[0])
        {       print CMD_IN $ra_io_channel->[0];
        }
        close(CMD_IN);

        my $selector = IO::Select->new();
        $selector->add(*CMD_ERR,*CMD_OUT);

        while(@ready = $selector->can_read)
        {       foreach $filehandle (@ready)
                {       if(fileno($filehandle) == fileno(CMD_ERR))
                        {       $ra_io_channel->[2] .= <CMD_ERR>;
                        }
                        else
                        {       $ra_io_channel->[1] .= <CMD_OUT>;
                        }

                        if(eof($filehandle))
                        {       $selector->remove($filehandle);
                        }
                }
        }

        close(CMD_OUT);
        close(CMD_ERR);

        return($exitstatus);
}

__END__
# 
# The code above, when run in a loop on the commandline (bourne-shell or bash)
# like this (remember, the script was called 'bug' here):

i=0 ; while [ $i -lt 50 ] ; do ./bug | grep STDERR  | wc  ; i=`expr $i + 1`
;done

#
# produces, for example, the following output:
#
    500    4500   26000
     99     891    5148
Segmentation fault
    500    4500   26000
    500    4500   26000
    500    4500   26000
    500    4500   26000
    500    4500   26000
    500    4500   26000
    211    1899   10972
Segmentation fault
    434    3906   22568
     26     234    1352
Segmentation fault
     48     432    2496
Segmentation fault
Use of uninitialized value in scalar assignment at ./bug line 30.
Use of uninitialized value in scalar assignment at ./bug line 30.
Unable to create sub named "" at ./bug line 30.
    274    2466   14248
    500    4500   26000
Attempt to free unreferenced scalar at ./bug line 30.
    289    2601   15028
Segmentation fault
    500    4500   26000
----------->8--- cut here ----8<--------
The example output was generated with perl 5.6.1 under Linux 2.2.14, but
this was the *best* constellation we could find until now. The machine
is a Pentium III 650 MHz with 128 MB RAM and is otherwise running without
any errors. Besides, we got almost the same behaviour on all systems we
tested (see above). So this can't be neither a CPU- nor machine- nor OS-
depending bug, but must be something that lurks somewhere deep in perl.

We can only guess that the open3-child dies before the anonymus sub can
catch the signal. And, I want to mention it again, the rate of errors
is increasing dramatically when the systems load is increased. (This
points towards a race condition somewhere in between "open" and "waitpid".)

The script above may run quite fine on the above mentioned PIII system,
but as soon as I move the mouse or open another xterm, the rate of
segfaults rises dramatically. Just try on your own.

We are very sad if we can't use IPC:Open3 because of this, but it is
currently absolutely unreliable and thus unacceptable for sysadmin
tasks. We will also have severe troubles in finding something more
reliable, since all we could do, is only re-implement IPC::Open3.

I guess Tom and Nathan will have a high interest in fixing this, because
the base for it is published in their Cookbook (which is otherwise
excellent!) and everybody and his uncle may run in the same problem
we did.

Do you know of this already? Is this something that can be found in
an FAQ (I guess not, otherwise you wouldn't have published the basic
code for this in the Cookbook...)?

Please inform us when you plan to fix it (if at all) and I hope
you inform us when you fixed it. BTW we are willing to serve
as beta-testers for this.

Best regards,

	Herbert

PS: Please send special greetings to Tom Christiansen, whom I had the
luck to meet in person during a perl training he held a couple of years
ago in Munich (about 1996). :) I'm the guy who worked for TSR here in 
germany, too. (I guess he can't remember, but that's no blame for him... :) )

-- 
Herbert Wengatz                  Phone MchP: +49 (0)89  / 636 - 47677
I&S IT PS 8                      Phone MchH: +49 (0)89  / 722 - 49296
Siemens AG                       Mobile    : +49 (0)160 / 8 85 16 85
Otto Hahn Ring 6                 Fax   MchP: +49 (0)89  / 636 - 47586
81738 Muenchen                   mailto:herbert.wengatz@mchr2.siemens.de
                   http://www.mvn-services.com
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=critical
---
Site configuration information for perl v5.6.1:

Configured by hwe at Thu Jun 21 10:08:59 MEST 2001.

Summary of my perl5 (revision 5.0 version 6 subversion 1) configuration:
  Platform:
    osname=linux, osvers=2.2.14, archname=i686-linux
    uname='linux elrond 2.2.14 #3 mon jan 29 13:47:05 cet 2001 i686 unknown '
    config_args='-de'
    hint=recommended, useposix=true, d_sigaction=define
    usethreads=undef use5005threads=undef useithreads=undef
usemultiplicity=undef
    useperlio=undef d_sfio=undef uselargefiles=define usesocks=undef
    use64bitint=undef use64bitall=undef uselongdouble=undef
  Compiler:
    cc='cc', ccflags ='-fno-strict-aliasing -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-fno-strict-aliasing -I/usr/local/include'
    ccversion='', gccversion='2.95.2 19991024 (release)', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t',
lseeksize=8
    alignbytes=4, usemymalloc=n, prototype=define
  Linker and Libraries:
    ld='cc', ldflags =' -L/usr/local/lib'
    libpth=/usr/local/lib /lib /usr/lib
    libs=-lnsl -lndbm -lgdbm -ldbm -ldb -ldl -lm -lc -lposix -lcrypt -lutil
    perllibs=-lnsl -ldl -lm -lc -lposix -lcrypt -lutil
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-rdynamic'
    cccdlflags='-fpic', lddlflags='-shared -L/usr/local/lib'

Locally applied patches:


---
@INC for perl v5.6.1:
    /usr/local/lib/perl5/5.6.1/i686-linux
    /usr/local/lib/perl5/5.6.1
    /usr/local/lib/perl5/site_perl/5.6.1/i686-linux
    /usr/local/lib/perl5/site_perl/5.6.1
    /usr/local/lib/perl5/site_perl
    .

---
Environment for perl v5.6.1:
    HOME=/home/hwe
    LANG=de_DE
    LANGUAGE (unset)
    LC_COLLATE=POSIX
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
   
PATH=/home/hwe/bin:/usr/local/bin:/usr/bin/mh:/opt/kde/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About