develooper Front page | perl.perl5.porters | Postings from July 2009

Re: Fixes for thread safety issues in PerlIO (5.8.6)

Thread Previous | Thread Next
Adam Skwersky
July 6, 2009 08:13
Re: Fixes for thread safety issues in PerlIO (5.8.6)
Message ID:

I did not receive any feedback regarding these proposed changes. 

Its not that I need feedback, but if this is the wrong Perl list to post 
this kind of content, can you please direct me to the correct list?

Adam Skwersky

Adam Skwersky/Lexington/IBM@IBMUS wrote on 06/30/2009 11:35:30 AM:

> To whom it may concern: 
> Rational ClearQuest (CQ) and ClearCase (CC) products ship with a 
> version of Perl that is packaged as 'ratlperl' and is also embedded 
> in the ClearQuest product for running Perl 'hooks' ? small scripts 
> that are run at certain times. In CQ/CC 7.0 and 7.0.1 and 7.1, we 
> used the 5.8.6  Perl source to build ratlperl. 
> I have been a CQ Core developer since 2001 and have worked on many 
> projects and issues related to our embedding Perl in ClearQuest. One
> improvement I made was having our CQ Perl hooks run on cloned 
> interpreters (using perl_clone) rather than rebuilding the entire 
> hook environment every time. This saved a tremendous amount of time 
> and memory, which was crucial for our server-based solutions that 
> come under heavy load. 
> The perl_clone solution has worked very well for us but we have seen
> 2 thread-safety issues with using perl_clone. 
> A) Thread-safety of PerlIO cloning on Solaris 
> The first issue might be Solaris specific as it is the only platform
> on which I have seen it, and the relevant PerlIO code has some 
> Solaris peculiarities. The issue is related to the way Perl clones 
> and eventually cleans up the PerlIO layers.  In the PerlIOStdio 
> layer the perl_destruct call tries to clean up the FILE structure 
> (see PerlIOStdio_close) without closing the underlying file descriptor 
> On most platforms, the PerlIOStdio_invalidate_fileno call will set 
> the fileno on the FILE to be -1 or the equivalent. On Solaris 
> however, this cannot be done because the FILE->_file member is 
> unsigned char. Instead this method returns 0 to indicate that the 
> FILE could not be invalidated normally. It reverts to the dup/dup2 
> 'hack' which has the comment ?which isn't thread safe?. 
> The sequence for the dup/dup2 hack goes like this: 
> 1     int fd = fileno(stdio); 
> 2     dupfd = PerlLIO_dup(fd); 
> 3     result = PerlSIO_fclose(stdio); 
> 4     PerlLIO_dup2(dupfd,fd); 
> 5     PerlLIO_close(dupfd); 
> The file descriptor stored in fd is closed by line 3 leaving it 
> available for another thread to use before it can be used in the 
> dup2 call on line 4. 
> I can reproduce this issue readily and it eventually leads to a 
> crash. Even if our Perl 'hook' script does not open any files, the 
> stdin/stdout/stderr PerlIO layers are always cloned. So one of the 
> standard FDs (0, 1 or 2) will get closed and used for other purposes
> (i.e. opening and reading a file). Since other parts of the program 
> still have FILE structures referencing these FDs (every existing 
> Perl interpreter, in fact), when they get used or cleaned up, the 
> result is unpredictable (usually garbage being read or written 
> from/to a file). Eventually another perl_clone call is made and it 
> tries to clone the PerlIOs corresponding to the 0-2 FDs. When it 
> tries to clone the PerlIO layer with the underlying FD that was 
> already closed, the fileno() call will trigger a segfault (crash). 
> Here is a typical stack trace from this crash: 
> =>[1] _fileno(0x0, 0xfb5f841c, 0x0, 0xfb5f89a0, 0x1, 0x0), at 0xfc4a805c 

>   [2] PerlIOStdio_dup(0x2611fb8, 0x13299bc, 0x34e7ac, 0xfb5f89a0, 
> 0x1, 0x151323c), at 0xfdf885fc 
>   [3] PerlIO_fdupopen(0x2611fb8, 0x34e7ac, 0xfb5f89a0, 0x1, 0x0, 
> 0x2c), at 0xfdf7fa08 
>   [4] Perl_fp_dup(0x2611fb8, 0x34e7ac, 0x0, 0xfb5f89a0, 0xff000000, 
> 0x17b32c4), at 0xfdec81c4 
>   [5] PerlIO_clone(0x2611fb8, 0x6b17f8, 0xfb5f89a0, 0x26124bc, 0x0, 
> 0x26129f0), at 0xfdf80378 
>   [6] perl_clone(0x6b17f8, 0x0, 0x4, 0x137d4, 0x4, 0x13a3358), at 
> .... rest of stack is not in Perl code. 
> I realize a fix was made in a later version of Perl to add a MUTEX 
> here (fix # unknown and I can no longer find it) but that only 
> protected file operations inside Perl code. It did not protect 
> against code outside of Perl. I see no easy fix for this issue, but 
> most of our users do not open/close their own files before the clone
> operation. The only PerlIOs that  get cloned in typical usage are 
> the PerlSIO_stdin, PerlSIO_Stderr, PerlSIO_Stdout.  To allow most of
> our users to continue seeing the benefits of perl_clone, I modified 
> the PerlIO cloning to skip stdin, stderr, stdout. In PerlIOStdio_dup: 
> Instead of doing this: 
>             stdio = PerlSIO_fdopen(fd, PerlIO_modestr(o,mode)); 
> do this: 
>         /* At this point we know we are doing a clone operation */ 
>       if (fd >= 0 && fd <= 2) { 
>                     /* Do not clone stdin/stdout/stderr, instead we 
> use the existing PerlSIO_* 
>               This is to allow perl_clone to work on Solaris without
> accidentally 
>               closing stdin, stdout, stderr when doing the dup/dup2 hack 
>               PerlIOStdio_close. 
>             */ 
>             switch (fd) { 
>                 case 0: 
>                     stdio = PerlSIO_stdin; 
>                     break; 
>                 case 1: 
>                     stdio = PerlSIO_stdout; 
>                     break; 
>                 case 2: 
>                     stdio = PerlSIO_stderr; 
>                     break; 
>                 } 
>       } else { 
>                   stdio = PerlSIO_fdopen(fd, PerlIO_modestr(o,mode)); 
>       } 
> Instead of cloning the PerlSIO_Stdin, PerlSIO_Stdout, 
> PerlSIO_Stderr, the Perl interpreters created with perl_clone will 
> share them. This is exactly what any Perl interpreter created with 
> perl_construct already does. I have tested this change and confirmed
> that it prevents the crashes in my test case. This does not make the
> cleanup of cloned PerlIOs thread-safe, but it does allow nearly all 
> of our users to continue using perl_clone. I believe this fix is 
> worthwhile to others. Anyone who uses cloned Perl interpreters on 
> multiple threads would benefit, as long as they don't already 
> open/close files before the clone operation. 
> B)  Thread-safety of PerlIO_fd_refcnt 
> The other issue I have seen is in the thread-safety of code that 
> uses PerlIO_fd_refcnt. This is an array of reference counts of FDs 
> that Perl is using. The perlio.c already has mutex code to protect 
> this array but it is only enabled when USE_THREADS is defined. It 
> also needs to be enabled when USE_ITHREADS is defined. If 
> usemultiplicity, usethreads and useithreads are all true during 
> configure, then in the C code,  USE_ITHREADS is defined, but the 
> USE_THREADS is not. I am not sure why USE_THREADS is undefined here,
> given 'usethreads' was 'true' during configure. If this is not a 
> defect, then the MUTEX also needs to be enabled if USE_ITHREADS is 
> This issue is extremely hard to reproduce, since the amount of code 
> that needs to be protected is so small (an increment or decrement of
> an integer), the likelihood of multiple threads causing an issue is 
> extremely remote. The only readily reproducible test case I had used
> an extremely powerful Solaris box with 128 virtual processors. It 
> still took 3 hours of nearly continuous Perl usage in 5 threads to 
> cause this issue. Essentially what happens is one of the 'shared' 
> FDs reference count will erroneously drop to 0 during the 
> perl_destruct call. This eventually lead to similar symtoms as (A) 
> (garbage being read from files and an eventual crash). It would 
> crash in perl_destruct as it tries to close an FD that had already 
> been closed (normally closing a file twice does not cause a crash 
> but it does sometimes). Here is a stack trace taken from the dump: 
> C  [] 
> C  []  free+0x2c 
> C  []  fclose+0x104 
> C  []  PerlIOStdio_close+0x108 
> C  []  PerlIO__close+0x48 
> C  []  Perl_PerlIO_close+0x20 
> C  []  PerlIO_cleantable+0x40 
> C  []  PerlIO_cleanup+0x88 
> C  []  perl_destruct+0xe68 
> Logs from dtrace and truss confirmed an FD was being closed twice, 
> triggering the segfault. I also confirmed that problems started 
> after one of the FDs 0-2 were closed in a perl_destruct/cleantable 
> call. Apparently the there is a race condition in the PerlIO_cleanup
> call where it adjusts the reference counts of FD 0-2 before and 
> after cleaning up.  Before the interpreters PL_perlio table is 
> cleaned, the FDs 0-2 have their refcounts bumped up by 1 to prevent 
> closure. After the table is cleaned, the refcounts are brought back 
> down to their correct value. 
> What likely happens is  --PerlIO_fd_refcnt[fd] and 
> PerlIO_fd_refcnt[fd]++ occur simultaneously (on different 
> processors) but only one has an effect (probably the one that 
> finished last). If the former finishes last, then the refcount may 
> have dropped to 0 and the cleanup code will close the FD in error. 
> The PerlIO_mutex in perlio.c would have protected these operations 
> if they were enabled by USE_ITHREADS. So I rebuilt with all the 
> #ifdef USE_ITHREADS 
> changed  to 
> #if defined (USE_ITHREADS) || defined (USE_THREADS) 
> in perlio.c. With the new Perl library I ran 5 tests  and could no 
> longer reproduce the issue. The tests ran 12 hours each. With the 
> original Perl library,  I was able to reproduce the issue within 3 
> hours on average.  I also re-ran the test with the original perl 
> library to reconfirm it crashed again. It did. I then re-ran the 
> test with the original perl library again, but this time with the 
> the process restricted to a single virtual processor. We were unable
> to reproduce the issue after several 12 hour test runs. This 
> corroborates the root-cause analysis above. 
> This change should be useful to the Perl community because it fixes 
> a thread-safety issue that could potentially occur on any multi-
> processor platform. We have only seen this on a highly-parallel 
> Solaris box, but there is nothing Solaris specific in the code. 
> Theoretically it could happen on any platform Perl is built on that 
> allows a process to run across multiple CPUs. 
> Here is the patched perlio.c: 
> Appendix: Here is the -V output for the version of perl we ship with
> our product: 
> /opt/rational/common/bin/ratlperl -V 
> Summary of my perl5 (revision 5 version 8 subversion 6) configuration: 
>   Platform: 
>     osname=solaris, osvers=2.8, archname=sun4-solaris-multi 
>     uname='sunos radium 5.8 generic_117350-26 sun4u sparc sunw,sun-
> blade-1500 ' 
>     config_args='-d -e -O -D cc=cc -D prefix=/opt/rational/common -D
> perl=ratlperl -D startperl=: -D perlpath=ratlperl -D usemultiplicity
> -U use5005threads -D usedl -D useshrplib -U usemymalloc -D cf_by=ibm
> -D -D
> -D uselargefiles -D usethreads -D useithreads -D use64bitint -D 
> make=clearmake -OvV -D ldcc=CC -D optimize=-xO4 -fns -fsingle -
> fsimple=2 -ftrap=%none -xmemalign=8s -xtarget=generic -xbuiltin=%all
> -xdepend -xlibmil -xlibmopt -xunroll=3 -D 
> libpth=/vobs/sys/SOLARIS/sun5.8/usr/lib -D 
> locincpth=/vobs/sys/SOLARIS/sun5.8/usr/include -U loclibpth= -D 
> lddlflags=-G -L$(PERL_INC) -lratlperl -mt -L/vobs/sys/SOLARIS/sun5.
> 8/usr/lib -norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -
> Bdynamic -D ldflags=-mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -
> norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic -D
> ccdlflags=-Bdynamic -D dlsrc=dl_dlopen.xs -D 
> ldlibpthname=LD_LIBRARY_PATH -D cccdlflags=-xcode=pic32 -D ccflags=-
> errfmt -xstrconst -mr -Qn -v -erroff=E_UNRECOGNIZED_PRAGMA_IGNORED,
> so -D so=so' 
>     hint=recommended, useposix=true, d_sigaction=define 
>     usethreads=define use5005threads=undef useithreads=define 
> usemultiplicity=define 
>     useperlio=define d_sfio=undef uselargefiles=define usesocks=undef 
>     use64bitint=define use64bitall=undef uselongdouble=undef 
>     usemymalloc=n, bincompat5005=undef 
>   Compiler: 
>     cc='cc', ccflags ='-D_REENTRANT -mt -DPERL_IMPLICIT_CONTEXT -
> DPERL_USE_SAFE_PUTENV -Xa -xtransition -errfmt -xstrconst -mr -Qn -v
> E_SLASH_STAR_IN_CMNT -I/vobs/sys/SOLARIS/sun5.8/usr/include -
>     optimize='-xO4 -fns -fsingle -fsimple=2 -ftrap=%none -
> xmemalign=8s -xtarget=generic -xbuiltin=%all -xdepend -xlibmil -
> xlibmopt -xunroll=3', 
> DPERL_USE_SAFE_PUTENV -Xa -xtransition -errfmt -xstrconst -mr -Qn -v
> E_SLASH_STAR_IN_CMNT -I/vobs/sys/SOLARIS/sun5.8/usr/include' 
>     ccversion='Sun C 5.6 Patch 117551-04 2005/02/15', gccversion='',
> gccosandvers='' 
>     intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=87654321 
>     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16 
>     ivtype='long long', ivsize=8, nvtype='double', nvsize=8, 
> Off_t='off_t', lseeksize=8 
>     alignbytes=8, prototype=define 
>   Linker and Libraries: 
>     ld='cc', ldflags ='-mt -mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -
> norunpath -R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic ' 
>     libpth=/vobs/sys/SOLARIS/sun5.8/usr/lib 
>     libs=-lsocket -lnsl -ldl -lm -lpthread -lc 
>     perllibs=-lsocket -lnsl -ldl -lm -lpthread -lc 
>     libc=/lib/, so=so, useshrplib=true, 
>     gnulibc_version='' 
>   Dynamic Linking: 
>     dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags='-
> Bdynamic -R 
>     cccdlflags='-xcode=pic32', lddlflags='-mt -G -L$(PERL_INC) -
> lratlperl -mt -L/vobs/sys/SOLARIS/sun5.8/usr/lib -norunpath -
> R/opt/rational/common/shlib -i -xmemalign=8s -Bdynamic' 
> Characteristics of this binary (from libperl): 
>   Compile-time options: MULTIPLICITY USE_ITHREADS USE_64_BIT_INT 
>   Built under solaris 
>   Compiled at Mar 20 2006 11:02:04 
>   @INC: 
>     /opt/rational/common/lib/perl5/5.8.6/sun4-solaris-multi 
>     /opt/rational/common/lib/perl5/5.8.6 
>     /opt/rational/common/lib/perl5/site_perl/5.8.6/sun4-solaris-multi 
>     /opt/rational/common/lib/perl5/site_perl/5.8.6 
>     /opt/rational/common/lib/perl5/site_perl 
> Adam Skwersky
> Advisory Software Engineer
> Rational Software
> IBM Software Group
> tel: 781 372 7817
> tty: 781 676 7574
> [attachment "ratlperl5.8.6.patch" deleted by Adam 
Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About