Front page | perl.perl5.porters |
Postings from July 2009
Re: Fixes for thread safety issues in PerlIO (5.8.6)
Thread Previous
|
Thread Next
From:
Adam Skwersky
Date:
July 6, 2009 08:13
Subject:
Re: Fixes for thread safety issues in PerlIO (5.8.6)
Message ID:
OF5973A57D.0A80A307-ON852575EB.00533064-852575EB.005395A0@us.ibm.com
Hi,
I did not receive any feedback regarding these proposed changes.
Its not that I need feedback, but if this is the wrong Perl list to post
this kind of content, can you please direct me to the correct list?
Thanks,
Adam Skwersky
Adam Skwersky/Lexington/IBM@IBMUS wrote on 06/30/2009 11:35:30 AM:
>
> To whom it may concern:
>
> Rational ClearQuest (CQ) and ClearCase (CC) products ship with a
> version of Perl that is packaged as 'ratlperl' and is also embedded
> in the ClearQuest product for running Perl 'hooks' ? small scripts
> that are run at certain times. In CQ/CC 7.0 and 7.0.1 and 7.1, we
> used the 5.8.6 Perl source to build ratlperl.
>
> I have been a CQ Core developer since 2001 and have worked on many
> projects and issues related to our embedding Perl in ClearQuest. One
> improvement I made was having our CQ Perl hooks run on cloned
> interpreters (using perl_clone) rather than rebuilding the entire
> hook environment every time. This saved a tremendous amount of time
> and memory, which was crucial for our server-based solutions that
> come under heavy load.
>
> The perl_clone solution has worked very well for us but we have seen
> 2 thread-safety issues with using perl_clone.
>
> A) Thread-safety of PerlIO cloning on Solaris
>
> The first issue might be Solaris specific as it is the only platform
> on which I have seen it, and the relevant PerlIO code has some
> Solaris peculiarities. The issue is related to the way Perl clones
> and eventually cleans up the PerlIO layers. In the PerlIOStdio
> layer the perl_destruct call tries to clean up the FILE structure
> (see PerlIOStdio_close) without closing the underlying file descriptor
(FD).
>
> On most platforms, the PerlIOStdio_invalidate_fileno call will set
> the fileno on the FILE to be -1 or the equivalent. On Solaris
> however, this cannot be done because the FILE->_file member is
> unsigned char. Instead this method returns 0 to indicate that the
> FILE could not be invalidated normally. It reverts to the dup/dup2
> 'hack' which has the comment ?which isn't thread safe?.
>
> The sequence for the dup/dup2 hack goes like this:
>
> 1 int fd = fileno(stdio);
> 2 dupfd = PerlLIO_dup(fd);
> 3 result = PerlSIO_fclose(stdio);
> 4 PerlLIO_dup2(dupfd,fd);
> 5 PerlLIO_close(dupfd);
>
> The file descriptor stored in fd is closed by line 3 leaving it
> available for another thread to use before it can be used in the
> dup2 call on line 4.
>
> I can reproduce this issue readily and it eventually leads to a
> crash. Even if our Perl 'hook' script does not open any files, the
> stdin/stdout/stderr PerlIO layers are always cloned. So one of the
> standard FDs (0, 1 or 2) will get closed and used for other purposes
> (i.e. opening and reading a file). Since other parts of the program
> still have FILE structures referencing these FDs (every existing
> Perl interpreter, in fact), when they get used or cleaned up, the
> result is unpredictable (usually garbage being read or written
> from/to a file). Eventually another perl_clone call is made and it
> tries to clone the PerlIOs corresponding to the 0-2 FDs. When it
> tries to clone the PerlIO layer with the underlying FD that was
> already closed, the fileno() call will trigger a segfault (crash).
>
> Here is a typical stack trace from this crash:
>
> =>[1] _fileno(0x0, 0xfb5f841c, 0x0, 0xfb5f89a0, 0x1, 0x0), at 0xfc4a805c
> [2] PerlIOStdio_dup(0x2611fb8, 0x13299bc, 0x34e7ac, 0xfb5f89a0,
> 0x1, 0x151323c), at 0xfdf885fc
> [3] PerlIO_fdupopen(0x2611fb8, 0x34e7ac, 0xfb5f89a0, 0x1, 0x0,
> 0x2c), at 0xfdf7fa08
> [4] Perl_fp_dup(0x2611fb8, 0x34e7ac, 0x0, 0xfb5f89a0, 0xff000000,
> 0x17b32c4), at 0xfdec81c4
> [5] PerlIO_clone(0x2611fb8, 0x6b17f8, 0xfb5f89a0, 0x26124bc, 0x0,
> 0x26129f0), at 0xfdf80378
> [6] perl_clone(0x6b17f8, 0x0, 0x4, 0x137d4, 0x4, 0x13a3358), at
0xfded00a0
> .... rest of stack is not in Perl code.
>
> I realize a fix was made in a later version of Perl to add a MUTEX
> here (fix # unknown and I can no longer find it) but that only
> protected file operations inside Perl code. It did not protect
> against code outside of Perl. I see no easy fix for this issue, but
> most of our users do not open/close their own files before the clone
> operation. The only PerlIOs that get cloned in typical usage are
> the PerlSIO_stdin, PerlSIO_Stderr, PerlSIO_Stdout. To allow most of
> our users to continue seeing the benefits of perl_clone, I modified
> the PerlIO cloning to skip stdin, stderr, stdout. In PerlIOStdio_dup:
>
> Instead of doing this:
>
> stdio = PerlSIO_fdopen(fd, PerlIO_modestr(o,mode));
>
> do this:
> /* At this point we know we are doing a clone operation */
> if (fd >= 0 && fd <= 2) {
> /* Do not clone stdin/stdout/stderr, instead we
> use the existing PerlSIO_*
> This is to allow perl_clone to work on Solaris without
> accidentally
> closing stdin, stdout, stderr when doing the dup/dup2 hack
in
> PerlIOStdio_close.
> */
> switch (fd) {
> case 0:
> stdio = PerlSIO_stdin;
> break;
> case 1:
> stdio = PerlSIO_stdout;
> break;
> case 2:
> stdio = PerlSIO_stderr;
> break;
> }
> } else {
> stdio = PerlSIO_fdopen(fd, PerlIO_modestr(o,mode));
> }
>
> Instead of cloning the PerlSIO_Stdin, PerlSIO_Stdout,
> PerlSIO_Stderr, the Perl interpreters created with perl_clone will
> share them. This is exactly what any Perl interpreter created with
> perl_construct already does. I have tested this change and confirmed
> that it prevents the crashes in my test case. This does not make the
> cleanup of cloned PerlIOs thread-safe, but it does allow nearly all
> of our users to continue using perl_clone. I believe this fix is
> worthwhile to others. Anyone who uses cloned Perl interpreters on
> multiple threads would benefit, as long as they don't already
> open/close files before the clone operation.
>
>
> B) Thread-safety of PerlIO_fd_refcnt
>
> The other issue I have seen is in the thread-safety of code that
> uses PerlIO_fd_refcnt. This is an array of reference counts of FDs
> that Perl is using. The perlio.c already has mutex code to protect
> this array but it is only enabled when USE_THREADS is defined. It
> also needs to be enabled when USE_ITHREADS is defined. If
> usemultiplicity, usethreads and useithreads are all true during
> configure, then in the C code, USE_ITHREADS is defined, but the
> USE_THREADS is not. I am not sure why USE_THREADS is undefined here,
> given 'usethreads' was 'true' during configure. If this is not a
> defect, then the MUTEX also needs to be enabled if USE_ITHREADS is
defined.
>
> This issue is extremely hard to reproduce, since the amount of code
> that needs to be protected is so small (an increment or decrement of
> an integer), the likelihood of multiple threads causing an issue is
> extremely remote. The only readily reproducible test case I had used
> an extremely powerful Solaris box with 128 virtual processors. It
> still took 3 hours of nearly continuous Perl usage in 5 threads to
> cause this issue. Essentially what happens is one of the 'shared'
> FDs reference count will erroneously drop to 0 during the
> perl_destruct call. This eventually lead to similar symtoms as (A)
> (garbage being read from files and an eventual crash). It would
> crash in perl_destruct as it tries to close an FD that had already
> been closed (normally closing a file twice does not cause a crash
> but it does sometimes). Here is a stack trace taken from the dump:
>
> C [libc.so.1+0x5725c]
> C [libc.so.1+0x57204] free+0x2c
> C [libc.so.1+0xaa6c4] fclose+0x104
> C [libratlperl.so+0x1368d8] PerlIOStdio_close+0x108
> C [libratlperl.so+0x133510] PerlIO__close+0x48
> C [libratlperl.so+0x13357c] Perl_PerlIO_close+0x20
> C [libratlperl.so+0x13138c] PerlIO_cleantable+0x40
> C [libratlperl.so+0x135940] PerlIO_cleanup+0x88
> C [libratlperl.so+0x27cc0] perl_destruct+0xe68
>
> Logs from dtrace and truss confirmed an FD was being closed twice,
> triggering the segfault. I also confirmed that problems started
> after one of the FDs 0-2 were closed in a perl_destruct/cleantable
> call. Apparently the there is a race condition in the PerlIO_cleanup
> call where it adjusts the reference counts of FD 0-2 before and
> after cleaning up. Before the interpreters PL_perlio table is
> cleaned, the FDs 0-2 have their refcounts bumped up by 1 to prevent
> closure. After the table is cleaned, the refcounts are brought back
> down to their correct value.
>
> What likely happens is --PerlIO_fd_refcnt[fd] and
> PerlIO_fd_refcnt[fd]++ occur simultaneously (on different
> processors) but only one has an effect (probably the one that
> finished last). If the former finishes last, then the refcount may
> have dropped to 0 and the cleanup code will close the FD in error.
>
> The PerlIO_mutex in perlio.c would have protected these operations
> if they were enabled by USE_ITHREADS. So I rebuilt with all the
>
> #ifdef USE_ITHREADS
>
> changed to
>
> #if defined (USE_ITHREADS) || defined (USE_THREADS)
>
> in perlio.c. With the new Perl library I ran 5 tests and could no
> longer reproduce the issue. The tests ran 12 hours each. With the
> original Perl library, I was able to reproduce the issue within 3
> hours on average. I also re-ran the test with the original perl
> library to reconfirm it crashed again. It did. I then re-ran the
> test with the original perl library again, but this time with the
> the process restricted to a single virtual processor. We were unable
> to reproduce the issue after several 12 hour test runs. This
> corroborates the root-cause analysis above.
>
> This change should be useful to the Perl community because it fixes
> a thread-safety issue that could potentially occur on any multi-
> processor platform. We have only seen this on a highly-parallel
> Solaris box, but there is nothing Solaris specific in the code.
> Theoretically it could happen on any platform Perl is built on that
> allows a process to run across multiple CPUs.
>
>
> Here is the patched perlio.c:
>
>
>
>
> Appendix: Here is the -V output for the version of perl we ship with
> our product:
>
> /opt/rational/common/bin/ratlperl -V
> Summary of my perl5 (revision 5 version 8 subversion 6) configuration:
> Platform:
> osname=solaris, osvers=2.8, archname=sun4-solaris-multi
> uname='sunos radium 5.8 generic_117350-26 sun4u sparc sunw,sun-
> blade-1500 '
> config_args='-d -e -O -D cc=cc -D prefix=/opt/rational/common -D
> perl=ratlperl -D startperl=: -D perlpath=ratlperl -D usemultiplicity
> -U use5005threads -D usedl -D useshrplib -U usemymalloc -D cf_by=ibm
> -D cf_email=sw_support@us.ibm.com -D perladmin=sw_support@us.ibm.com
> -D uselargefiles -D usethreads -D useithreads -D use64bitint -D
> make=clearmake -OvV -D ldcc=CC -D optimize=-xO4 -fns -fsingle -
> fsimple=2 -ftrap=%none -xmemalign=8s -xtarget=generic -xbuiltin=%all
> -xdepend -xlibmil -xlibmopt -xunroll=3 -D
> libpth=/vobs/sys/SOLARIS/sun5.8/usr/lib -D
> locincpth=/vobs/sys/SOLARIS/sun5.8/usr/include -U loclibpth= -D
> lddlflags=-G -L$(PERL_INC) -lratlperl -mt -L/vobs/sys/SOLARIS/sun5.
> 8/usr/lib -norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -
> Bdynamic -D ldflags=-mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -
> norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic -D
> ccdlflags=-Bdynamic -D dlsrc=dl_dlopen.xs -D
> ldlibpthname=LD_LIBRARY_PATH -D cccdlflags=-xcode=pic32 -D ccflags=-
> mt -DPERL_IMPLICIT_CONTEXT -DPERL_USE_SAFE_PUTENV -Xa -xtransition -
> errfmt -xstrconst -mr -Qn -v -erroff=E_UNRECOGNIZED_PRAGMA_IGNORED,
> E_STATEMENT_NOT_REACHED,E_SLASH_STAR_IN_CMNT -D libperl=libratlperl.
> so -D so=so'
> hint=recommended, useposix=true, d_sigaction=define
> usethreads=define use5005threads=undef useithreads=define
> usemultiplicity=define
> useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
> use64bitint=define use64bitall=undef uselongdouble=undef
> usemymalloc=n, bincompat5005=undef
> Compiler:
> cc='cc', ccflags ='-D_REENTRANT -mt -DPERL_IMPLICIT_CONTEXT -
> DPERL_USE_SAFE_PUTENV -Xa -xtransition -errfmt -xstrconst -mr -Qn -v
> -erroff=E_UNRECOGNIZED_PRAGMA_IGNORED,E_STATEMENT_NOT_REACHED,
> E_SLASH_STAR_IN_CMNT -I/vobs/sys/SOLARIS/sun5.8/usr/include -
> D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
> optimize='-xO4 -fns -fsingle -fsimple=2 -ftrap=%none -
> xmemalign=8s -xtarget=generic -xbuiltin=%all -xdepend -xlibmil -
> xlibmopt -xunroll=3',
> cppflags='-D_REENTRANT -mt -DPERL_IMPLICIT_CONTEXT -
> DPERL_USE_SAFE_PUTENV -Xa -xtransition -errfmt -xstrconst -mr -Qn -v
> -erroff=E_UNRECOGNIZED_PRAGMA_IGNORED,E_STATEMENT_NOT_REACHED,
> E_SLASH_STAR_IN_CMNT -I/vobs/sys/SOLARIS/sun5.8/usr/include'
> ccversion='Sun C 5.6 Patch 117551-04 2005/02/15', gccversion='',
> gccosandvers=''
> intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=87654321
> d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
> ivtype='long long', ivsize=8, nvtype='double', nvsize=8,
> Off_t='off_t', lseeksize=8
> alignbytes=8, prototype=define
> Linker and Libraries:
> ld='cc', ldflags ='-mt -mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -
> norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic '
> libpth=/vobs/sys/SOLARIS/sun5.8/usr/lib
> libs=-lsocket -lnsl -ldl -lm -lpthread -lc
> perllibs=-lsocket -lnsl -ldl -lm -lpthread -lc
> libc=/lib/libc.so, so=so, useshrplib=true, libperl=libratlperl.so
> gnulibc_version=''
> Dynamic Linking:
> dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-
> Bdynamic -R
/opt/rational/common/lib/perl5/5.8.6/sun4-solaris-multi/CORE'
> cccdlflags='-xcode=pic32', lddlflags='-mt -G -L$(PERL_INC) -
> lratlperl -mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -norunpath -
> R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic'
>
>
> Characteristics of this binary (from libperl):
> Compile-time options: MULTIPLICITY USE_ITHREADS USE_64_BIT_INT
> USE_LARGE_FILES PERL_IMPLICIT_CONTEXT
> Built under solaris
> Compiled at Mar 20 2006 11:02:04
> @INC:
> /opt/rational/common/lib/perl5/5.8.6/sun4-solaris-multi
> /opt/rational/common/lib/perl5/5.8.6
> /opt/rational/common/lib/perl5/site_perl/5.8.6/sun4-solaris-multi
> /opt/rational/common/lib/perl5/site_perl/5.8.6
> /opt/rational/common/lib/perl5/site_perl
>
>
>
>
> Adam Skwersky
> Advisory Software Engineer
> Rational Software
> IBM Software Group
> tel: 781 372 7817
> tty: 781 676 7574
> [attachment "ratlperl5.8.6.patch" deleted by Adam
Skwersky/Lexington/IBM]
Thread Previous
|
Thread Next