develooper Front page | perl.perl5.porters | Postings from June 2012

NWCLARK TPF grant report #39

From:
Nicholas Clark
Date:
June 14, 2012 03:10
Subject:
NWCLARK TPF grant report #39
Message ID:
20120614101001.GT37285@plum.flirble.org
[Hours]		[Activity]
2012/05/31	Thursday
 0.50		?->
 0.50		IO::Socket::IP
 2.75		reading/responding to list mail
 0.25		smoke-me branches
 1.00		t/porting/checkcase.t
 0.25		the todo list
=====
 5.25

2012/06/01	Friday
 0.25		Locale::Codes
 2.50		reading/responding to list mail
 2.25		t/op/getppid.t
=====
 5.00

2012/06/02	Saturday
 0.75		HP-UX -Duse64bitall
 0.25		Locale::Codes
 0.25		RT #113094
 0.25		RT #113464
 1.25		VMS
 1.25		lib/File/stat.t
 0.75		make_ext.pl
 0.25		mktables memory usage
 2.25		reading/responding to list mail
 0.25		smoke-me branches
 0.25		t/op/getppid.t
=====
 7.75

2012/06/03	Sunday
 0.50		File:DosGlob
 2.00		Porting tests with -Dmksymlinks
 0.25		RT #113472
 2.75		VMS
 2.25		lib/File/stat.t
=====
 7.75

Which I calculate is 25.75 hours

So, after being the ill parent, I then get to be the emergency babysitting
service because the other parent gets ill. Fortunately everyone (else) was
well enough to travel to Vienna midweek, which gives me the house to myself.
(Muhahahaha. I can take it all over and make a big mess in the name of
"tidying", without needing to leave any part of it toddler safe.)

So in the time available to me, I mostly tidied up lots of small things that
were half done, or easy to finish.

While investigating the problem described last week, of testing as root but
the file tree owned by a different user, I discovered a different problem
with File::stat's tests when running as root. This had prompted me to look at
its regression test, to get a feel for why this wasn't spotted before. At
which point the scale of the job escalated, because the test didn't look
that robust or as complete as it could be. So this week I started on
refactoring it, to give increase my confidence in it actually being able to
spot bugs. In the process I discovered that just like t/op/filetest.t it had
accumulated quite a lot of layers of cruft, each attempting to solve one of
the various problems that had emerged over the years, but none looking
closely at what the *intent* other work arounds were, and whether 
combination could be simplified.

In this case, again the problems related to "which is our victim file?"
Initially the test used t/TEST. But (again), that file might be a symlink
(if built with -Dmksymlinks) so the setup code has to check for that, and if
so, chase the symlink back to the original file. All good so far. But, it
gets worse: the test checks the output of stat, which includes the last
access time, and with parallel testing, other files can read that. So the
script was changed to use itself as the victim. However, the original code
made the "golden result" stat call in a BEGIN block, which can still go
wrong because BEGIN blocks run during compilation, and it may be that the
interpreter hasn't finished reading the entire file yet, in which case the
atime will get updated.

The simple solution is to use a tempfile.

This also has the (future) benefit of being able to change the file's mode,
and hence test many more permutations of the various file operators.


I also fixed a very long standing race condition in t/op/getppid.t
To achieve the desired process exit order between child and grandchild
process, the code as originally contributed simply used a C<sleep 1> and
a C<sleep 2>. This is remarkably effective, except when it isn't. Very
occasionally it would fail under heavy load, which is completely
undesirable when one has lots of automated testing, because inevitably it
fails every so often, and the false positives have to be investigated to
be identified as such. Just for a change, on this one it turned out that
I wasn't fighting unshaved Yaks - here the problem turned out to be Zombies!
Yes, the original test didn't worry about reaping its child processes,
because the parent exited soon enough, and the odd short term undead didn't
actually get in the way of what it needed to do. But my refactored version
*did* get confused for a while (as did I) - why hasn't this process gone
away? Why does C<kill 0> still think that it's valid? The answer turned out
to be "lack of a $SIG{CHLD} handler", and adding one to ignore children made
the process go away. [Please note, this approach does not work on real
children. For that you need the bribe of a trip to visit Oma. :-)]

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About