On Mon, 28 Sep 2009, George Greer wrote: > On Mon, 28 Sep 2009, Steve Hay wrote: >> >> I had a lot of trouble in the past with tests hanging in my smokers, and >> Jerry Hedden kindly came up with a solution: the Test::watchdog() >> function. >> >> That function is called from several test scripts that are liable to >> hang (a couple in each of: IO, threads, and threads/shared), and should >> kill them if they hang around too long. >> >> Perhaps we need to add watchdog() to a few more test scripts? > > If it happens again, I'll keep track of the tests that are active at the > time. Doesn't happen often though. Only one I remember from this time is > the one I canceled. Ok, maybe I under-estimated "not often"... it did it again. Looks like the same three suspects from last time: ./perl -I.. -MTestInit io/openpid.t ./perl -I.. -MTestInit io/perlio.t ./perl -I.. -MTestInit io/perlio_leaks.t All three in a 'Wait:Executive' state. Last lines in the log: io/fflush.t ....................................................... ok io/fs.t ........................................................... ok io/inplace.t ...................................................... ok io/iprefix.t ...................................................... ok io/layers.t ....................................................... ok io/nargv.t ........................................................ ok io/open.t ......................................................... ok Same flags as last time: -Dusedevel -Duseithreads -DDEBUGGING Only activity that Process Monitor shows for the three is a "Thread Create" followed immediately by "Thread Exit", and it has been 5 minutes since each test did that their one time. Oddly, Process Explorer shows "Process | <Non-existent Process>(916)" in addition to the three live (stalled) tests. Perhaps the job driver hung trying to reap a child? I wouldn't expect the other three tests to be still alive then, though. I'll try killing "io/perlio_leaks.t" this time instead of "io/openpid.t" like last time. Hrm, that didn't seem to do anything; maybe it does matter which I kill. Ok, "io/perlio.t" next then. Still not budging... guess it is "io/openpid.t". The "Process" type now has three non-existent processes: 916 and the two I killed 520, 500. Now to kill 424, io/openpid.t... The driver reaped the children and started running again. I think we have a suspect. -- George GreerThread Previous | Thread Next