On Thu, May 24, 2012 at 2:12 PM, alexs@ecoscentric.com <perlbug-followup@perl.org> wrote: > The trigger to this appears to be a pipe process, a bash script, started by my > main application. I have verified the script has finished ("echo FINISH > /tmp/foo" > as the last line of the script) but my main application is locked in a select() > which includes the pipe. The main application select() does not return, and > neither does join(), yet both the thread and script have terminated. In general, system(3) is not particularly thread safe (for both signal handling as asynchronous safety related issues), though for most purposes it's ok. I'm not sure that's the real issue here, but it's worth pointing that out. > Unfortunately I have not been able to reproduce this issue with a simple > case. My application is a pretty complex automated build and test system > which runs test on remote hardware and logs results in a MySQL database, > and this problem occurs every 4-8 hours after succesfully running > several hundred thousand tests and many builds. This application is > also well tested and has been in place for almost 10 years. That sounds like some weird race condition. Judging by the code of threads.pm, I can imagine how this is going to happen. A thread is marked joinable right before it is actually destroyed, so if the create/destroy mutex corrupted (someone locked it but didn't unlock it), it will hang the thread's death. I'm not quite sure what causes this though. > I have unfortunately had to revert back to my own thread emulation > system where I have a "fork_and_call" function which forks and calls > a given function with the pointer to the function and its arguments > passed to fork_and_call, pushing PIDs on a stack, and a signal handler to > reap SIGCHLD signals and verify when certain "threads" have finished > (ignoring SIGCHLD from terminating pipe processes). Have your tried upgrading your version of threads.pm? It may or may not fix this issue but it's worth a try. LeonThread Previous