Front page | perl.perl5.porters |
Postings from May 2009
Re: Defining stability and testing it
From: Nicholas Clark
May 28, 2009 11:22
Re: Defining stability and testing it
Message ID: 20090528182221.GQ55267@plum.flirble.org
On Wed, May 27, 2009 at 10:25:12PM -0400, David Golden wrote:
> On Wed, May 27, 2009 at 5:51 PM, Nicholas Clark <email@example.com> wrote:
> > That's not "all tests pass" as we may need to mark tests as todo for
> > particular platforms, but we need to know which tests, and which platforms,
> > and why. (I'm thinking for "it's a bug not yet fixed", which is much the same
> > as "it's a new test", not, "it's an existing test, so this is a regression we
> > want to ship")
> I mean "all tests pass" in the sense of the harness reporting success,
> which would ignore TODO failures.
True. Both statements are accurate. But I'd regard marking tests as TODO
just before a release as a form of cheating.
The comment of someone I used to know seems to apply here. The story is:
We were at the Bluebell, using the improvised-to-be-portable pillar drill
to drill holes in rail to make signals.*
The thermal cut out on the on/off switch kept cutting it out - we were trying
to use too much power. This was probably because we had a single phase motor,
when really it should have been three phase. (We still don't. See part 2 of
http://use.perl.org/~nicholas/journal/16169 for some background.)
Alf, in charge of our group, held in the "on" button to stop it cutting out.
"That's cheating!" I exclaimed.
"Life is knowing when to cheat", he replied.
> > The level of knowledge drills down:
> > 1: Which modules changed state?
> > [Can certainly be automated]
> > 2: What is the nature of the test failure?
> > [ie what perl -Mblib t/what_goes_here.t do I want to run to see it?]
> > 3: Which core change caused this?
> > [distill the above to a script that can be fed to git bisect]
> > 4: Which other modules fail in the same way?
> > [hopefully as simple as "group newly failing modules by step 3]
> > at which point it's possible to start to triage this cluster of failures.
> > I'm not sure how automatable it is, but it's certainly distributable
> > (and it was, to a degree, but Slaven and Andreas need not be the only ones
> > doing it. Berlin 2, rest of world 0)
> > And then the tougher parts
> > 5: What caused the change?
> > 6: "Can you fix it?"
> The diagnostic might be automatable, but sounds more like the work to
> resolve breakage after it's detected. Or is there something in all
> that analysis that would lead you to make a determination of whether
> the breakage can be ignored? (Aside from VMS, which I assume is often
> the case.) So, yes, I can see how this helps you, but is it really
> more than is necessary to get the green/red light?
I only felt comfortable once I had reached step 5 on all reported regressions.
> > For anything other than a .0 release - are we binary compatible with the
> > previous release(s)?
> > Roughly
> > 1: Are all symbols exported by the previous release still exported, and still
> > of the same type?
> > 2: Are all public structures where size matters the same?
> > 3: Do all other public structures start with the same members?
> OK. I'll assume that someone with decent C skills can code up some tests.
I don't think it's as important as what follows, as it's perfectly possible to
spot any change to C structures by vetting header files as they are merged,
and it's possible to spot most of the exports by diffing embed.fnc
> > 4: Does the candidate module *built with the previous release* pass its tests
> > when run with the pending release?
> Hmm. That's going to be a little trickier than what we can do easily
> by adapting normal CPAN testing setups, but not impossible.
It's also the one that can't be spotted by diff inspection, and hence why
regressions such as http://rt.perl.org/rt3/Ticket/Display.html?id=63886
If you're up for the challenge, and have the time, it's actually the one
that I think would be most interesting to have automated testing for first.
(Although scaling up whatever Andreas and Slaven did to simply identify
"1: Which modules changed state?" right at the top, and no more, would be
* It should be noted that all of the signals that we built for the S&T
department in the period 1995-2005 still exist, are in use, and may well
outlive me. I believe that all of the software I was paid to write in that
period has disappeared without trace, with the exception of open source that
escaped. There is a certain satisfaction in steelwork.