develooper Front page | perl.perl5.porters | Postings from October 2013

NWCLARK TPF grant report #101

From:
Nicholas Clark
Date:
October 3, 2013 13:43
Subject:
NWCLARK TPF grant report #101
Message ID:
20131003134323.GI4940@plum.flirble.org
[Hours]		[Activity]
2013/08/05	Monday
 0.50		/[#$x]/
 6.75		NULL PL_curcop
 1.00		RT #119089
 0.50		RT #119155
 1.00		reading/responding to list mail
 0.25		threads::shared leaks
=====
10.00

2013/08/06	Tuesday
 3.25		AIX t/op/fork.t
 0.50		ID 20020301.005 (RT #8732)
 2.50		NULL PL_curcop
 0.25		RT #92446
 1.50		Test::PerlRun on VMS
=====
 8.00

2013/08/07	Wednesday
 3.25		AIX t/op/fork.t
 2.00		AIX t/op/fork.t (VMS runperl)
 0.75		RT #119195, RT #119197
 0.25		bisect.pl
 2.50		runperl
=====
 8.75

2013/08/08	Thursday
 1.75		AIX t/op/fork.t (VMS runperl)
 0.50		HP-UX 64bit
 0.25		NULL PL_curcop
 0.25		RT #107816
 0.25		RT #119097
 0.25		RT #119181
 0.25		VMS
 0.50		lib/perl5db.t
=====
 4.00

2013/08/09	Friday
 4.00		reading/responding to list mail
=====
 4.00

Which I calculate is 34.75 hours

A fair chunk of this week was consumed with figuring out why an innocent
looking test had started failing on AIX, but not other Unix systems.
Specifically, t/op/fork.t, which had not been modified, was now failing in
two places, when previously it used to pass. This implies that some
unrelated change had unfortunate side effects, but what? And why?

Fortunately this is just the sort of job that git bisect excels at, and now
that most of it is automated it only takes small amount of setup effort to
get the computer busy figuring out the answer. Which turned out to be a
perfectly innocent looking commit to a harness function that t/op/fork.t was
using which fixed a frustrating bug:

    commit 684b0ecaad281760b04dd2da317ee0459cafebf6
    Author: Tony Cook <tony@develop-help.com>
    Date:   Tue Jul 16 14:57:20 2013 +1000
    
        [perl #116190] feed an empty stdin to run_multiple_progs() programs
    
        Two tests for -a were attempting to read stdin and blocking with the -a
        implies -n change.


So why does this cause a problem? And why only on AIX?

run_multiple_progs() runs its programs using backticks. (backticks turn out
to be the only sufficiently portable way to easily capture output). The code
it calls already had a feature to feed things to STDIN, which for various
reasons is done by using a pipeline from another perl process. So what that
commit does is change the generated backtick command from

    `./perl -Ilib -e ...`

to

    `./perl -e "print qq()" | ./perl -Ilib -e ...`

so that stdin is at EOF, and the thing we're testing can't end up hanging if
it inadvertently reads STDIN.

It turns out that this causes fun on AIX with two tests, because on AIX
/bin/sh is actually ksh, and ksh does pipes differently (with one less
process). With sh, for the latter command line the sh process forks two
children, which use exec to start the two perl processes. The parent shell
process persists for the duration of the pipeline, and the second perl
process starts with no children. With ksh (and zsh), the shell saves a
process by forking a child for just the first perl process, and execing
itself to start the second. This means that the second perl process starts
with one child which it didn't create. This breaks the tests assume that
wait (or waitpid) will only return information about processes started
within the test. One can replicate this is on Linux:

$ sh -c 'pstree -p $$ | cat'
sh(13261)-+-cat(13263)
          `-pstree(13262)
$ ksh -c 'pstree -p $$ | cat'
cat(13349)---pstree(13350)

Aside from who gets the child process, the behaviour is identical, because
both structures end up with the exit status of the pipeline being that of
the last command - in the ksh case because the last command *is* the parent,
in the /bin/sh case because the sh process ensures that it calls exit() with
the exit status of the child that it reaped.

Problem is that the tests in t/op/fork.t *are* sensitive to who gets the
child process, because they are counting how many children get reaped. I
thought about fixing the tests to make them immune to unexpected extra child
processes, but then realised that it was probably easier to change the
generated backtick command to be

    `./perl </dev/null -Ilib -e ...`

given that all that mattered was that reads on STDIN avoid hanging, not that
they return promptly return anything specific (ie eof instead of error).

I wasn't sure whether this approach would work on VMS, so I tested it there.
To be sure that only things that I expected to change changed, I first ran
the test suite unmodified. This revealed that t/lib/croak.t, lib/charnames.t
and lib/warnings.t failed. These failures are relatively new, and all look
like this:

    1..37
    # From PTAC$DKA0:[NCLARK.I.perl-7d9633e5aba8.t.lib.croak]mg.
    PROG:
    # mg.c
    $SIG{_HUNGRY} = \&mmm_pie;
    warn "Mmm, pie";
    EXPECTED:
    No such hook: _HUNGRY at - line 2.
    EXIT STATUS: != 0
    GOT:
    No such hook: _HUNGRY at - line 2.
    EXIT STATUS: 0
    not ok 1 - Perl_magic_setsig
    t/[.lib]croak ... FAILED at test 1
    Failed 1 test out of 1, 0.00% okay.
            [.lib]croak.t


ie the test runs a program, and whilst it is getting the output it expects, it
no longer gets the exit status it expects. Craig Berry explained why:

    The root cause of the problem on VMS is that command-line redirection
    is done by Perl and not by the shell.  Tony's addition of the stdin
    parameter to runperl gives us the equivalent of:
    
    $ perl -e "exit 2;" | perl -e "exit 0;"
    %NONAME-E-NOMSG, Message number 00000002
    $ show symbol $status
      $STATUS == "%X00000002"
    
    The Perl process with an exit value of 0 is a child of the one that
    has an exit value of 2 so the final status we see in runperl is the
    exit value of the parent, not of the child.  But the child is actually
    the test program whose exit value we're interested in and we don't get
    it this way.

Craig had already observed these failures resulting from Tony's (quite
reasonable) change and had come up with a fix involving another pipe to
capture the child's exit status, but because of the ugly complexity that
entailed he'd been "sitting on it for a bit hoping to think of something
better".

So it turned out that the solution of setting up STDIN as EOF actually got
us two fixes for the price of one. Strictly, we haven't "solved" the
incompatibility of the VMS pipe implementation when it comes to exit
statuses, because instead we've avoided the tests relying on it. But as the
VMS behaviour has been that way for over a decade now, it seems unlikely
that it's going to change any time soon.

I need to thank Zefram for helping diagnose the problem.

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About