develooper Front page | perl.perl5.porters | Postings from May 2009

Re: Defining stability and testing it

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
May 28, 2009 11:22
Subject:
Re: Defining stability and testing it
Message ID:
20090528182221.GQ55267@plum.flirble.org
On Wed, May 27, 2009 at 10:25:12PM -0400, David Golden wrote:
> On Wed, May 27, 2009 at 5:51 PM, Nicholas Clark <nick@ccl4.org> wrote:
> > That's not "all tests pass" as we may need to mark tests as todo for
> > particular platforms, but we need to know which tests, and which platforms,
> > and why. (I'm thinking for "it's a bug not yet fixed", which is much the same
> > as "it's a new test", not, "it's an existing test, so this is a regression we
> > want to ship")
> 
> I mean "all tests pass" in the sense of the harness reporting success,
> which would ignore TODO failures.

True. Both statements are accurate. But I'd regard marking tests as TODO
just before a release as a form of cheating.

The comment of someone I used to know seems to apply here. The story is:

  We were at the Bluebell, using the improvised-to-be-portable pillar drill
  to drill holes in rail to make signals.*
  The thermal cut out on the on/off switch kept cutting it out - we were trying
  to use too much power. This was probably because we had a single phase motor,
  when really it should have been three phase. (We still don't. See part 2 of
  http://use.perl.org/~nicholas/journal/16169 for some background.)
  Alf, in charge of our group, held in the "on" button to stop it cutting out.
  "That's cheating!" I exclaimed.
  "Life is knowing when to cheat", he replied.


> > The level of knowledge drills down:
> >
> > 1: Which modules changed state?
> >   [Can certainly be automated]
> >
> > 2: What is the nature of the test failure?
> >   [ie what perl -Mblib t/what_goes_here.t do I want to run to see it?]
> >
> > 3: Which core change caused this?
> >   [distill the above to a script that can be fed to git bisect]
> >
> > 4: Which other modules fail in the same way?
> >   [hopefully as simple as "group newly failing modules by step 3]
> >
> > at which point it's possible to start to triage this cluster of failures.
> > I'm not sure how automatable it is, but it's certainly distributable
> > (and it was, to a degree, but Slaven and Andreas need not be the only ones
> > doing it. Berlin 2, rest of world 0)
> >
> > And then the tougher parts
> >
> > 5: What caused the change?
> >
> > 6: "Can you fix it?"
> 
> The diagnostic might be automatable, but sounds more like the work to
> resolve breakage after it's detected.  Or is there something in all
> that analysis that would lead you to make a determination of whether
> the breakage can be ignored?  (Aside from VMS, which I assume is often
> the case.)  So, yes, I can see how this helps you, but is it really
> more than is necessary to get the green/red light?

I only felt comfortable once I had reached step 5 on all reported regressions.

> > For anything other than a .0 release - are we binary compatible with the
> > previous release(s)?
> >
> > Roughly
> >
> > 1: Are all symbols exported by the previous release still exported, and still
> >   of the same type?
> > 2: Are all public structures where size matters the same?
> > 3: Do all other public structures start with the same members?
> 
> OK.  I'll assume that someone with decent C skills can code up some tests.

I don't think it's as important as what follows, as it's perfectly possible to
spot any change to C structures by vetting header files as they are merged,
and it's possible to spot most of the exports by diffing embed.fnc

> > 4: Does the candidate module *built with the previous release* pass its tests
> >   when run with the pending release?
> 
> Hmm.  That's going to be a little trickier than what we can do easily
> by adapting normal CPAN testing setups, but not impossible.

It's also the one that can't be spotted by diff inspection, and hence why
regressions such as http://rt.perl.org/rt3/Ticket/Display.html?id=63886
slipped through.

If you're up for the challenge, and have the time, it's actually the one
that I think would be most interesting to have automated testing for first.

(Although scaling up whatever Andreas and Slaven did to simply identify
"1: Which modules changed state?" right at the top, and no more, would be
useful)

Nicholas Clark

* It should be noted that all of the signals that we built for the S&T
  department in the period 1995-2005 still exist, are in use, and may well
  outlive me. I believe that all of the software I was paid to write in that
  period has disappeared without trace, with the exception of open source that
  escaped. There is a certain satisfaction in steelwork.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About