develooper Front page | perl.qa | Postings from August 2000

CPANTS - CPAN Testing Service [synopsis]

From:
Michael G Schwern
Date:
August 19, 2000 19:33
Subject:
CPANTS - CPAN Testing Service [synopsis]
Message ID:
20000819223335.A2160@athens.aocn.com
=pod

=head1 NAME

CPANTS - Comprehensive Perl Archive Network Testing Service


=head1 SYNOPSIS

CPANTS's primary goal is to automate the process of testing and
quality assurance of CPAN modules.  Its secondary goal is to provide a
centralized place for reviews and feedback.  Another goal is the
building of karma for CPAN authors indicating the overall kwalitee of
their modules.  The final, and possibly most important goal, is to
continue the fine tradition of bad puns in the Perl community.

For its primary purpose, CPANTS will run a series of automated
kwalitee tests (kwalitee is explained below) over each module in CPAN
to determine how fit each module is.  If a module is determined to be
of low kwalitee it is turned over to a human tester to be handed
manually.  More details on this later.

For the secondary goal of review and feedback, a short series of
simple quizzes will be asked of module users about their experiences
with the module, likes, dislikes, gripes, etc...

The karmatic option is perhaps the trickiest one.  Its a heuristic
which attempts to indicate the given author's continued production of
good modules.  It can be used to predict the kwalitee of new modules
and sanity check CPANTS's tests (ie. if a new module from an author
with high karma suddenly flops its kwalitee checks it may indicate a
poorly written CPANTS test).  It can also be used to weight the
author's reviews on the assumption that good module authors will know
a good module when they see it.  Finally, trusted authors (those with
high karma) can produce lists of CPAN modules they trust which others
can peruse.

CPANTS.  CPANTS run.  Run, PANTS, run.


=head1 GOALS

=head2 Automated kwalitee tests

CPANTS's primary purpose for existence is to run a series of kwalitee
tests over CPAN using all sane combinations of perl versions, perl
configurations, OS's, libraries, utilities, etc...

The tests themselves will be broken into tiers, each representing
increasing levels of kwalitee.  What exactly is done with this
information is up to the CPAN cabal.

The list of tests might look something like the one below.  I'm going
to be making alot of use of 'N' to indicate a number which will be
filled in later.  Some of the numbers may be calibrated continuously,
some maybe relative to a reference module, some may be completely
arbitrary.  However, its too early to get into specifics.

    Basic Competency
    ----------------
    Files
    - Needs README and/or INSTALL, MANIFEST and Makefile.PL
    
    Tests
    - Must have some sort of tests (test.pl or a t/ directory)
    - Should pass its own tests.
    - Every .pm file should compile (ie. perl -MModule::Name -e 1 works)
    
    Configurations
    - Tests run on all stable, popular perls (currently 5.004_04, _05,
      5.005_02, _03 and 5.6.0) with differing popular configurations
      (perl's malloc, threading, 64 bit, unicode, etc...)  Death by
      explict require is okay, but it should be in the Makefile.PL.
    - Tested on all "popular and sane" OS/hardware/library/perl version
      combos.  Linux/x86, Solaris/Sparc, libc5, glibc2, ActivePerl, etc...
    
    Plays Well With Others
    - Look for Makefile.PL security evilness (ie. system("rm -rf /"))
    
    
    Red Kwalitee Flags
    ------------------
    Tests
    - Do the tests cover at least NN% of the code?
    - Complexity rating lower than N on B::Fathom or other automated
      complexity check.
    
    Docs
    - NN% of the total code should be documentation.
    - All .pm files should contain some sort of docs, even if its just
      NAME and AUTHOR.
    
    Plays Well With Others
    - Does it use naughty regex globals?  ($& and friends)
    - Does it poke around in other namespaces alot?
    - Does it modify @ISA alot?  (blowing out the method cache)
    
    Portability
    - Does it use "\r\n" instead of "\015\012"?
    
    
    Style Tests
    -----------
    Code
    - Average subroutine length should not be more than N lines long.
    - No subroutine should be more than N lines.
    - No continuous block of comments should be more than N lines long.
      (The assumption is that such a large block is either A)   
      documentation and thus should be POD, or B) the code being commented 
      is way too complicated to require that much explaination)
    - Do $_ and @_ make up more than NN% of all variables?
    
    Features
    - Does it use non-portable features?  (fork(), %SIG, alarm()...)
    - Does it use any questionable/experimental language features?
    
    Performance
    - Module should load in < NN ms.  This is an excellent candidate for
      selecting a reference module to calibrate this test.
    
    Files
    - Is there a Changes log and is it up to date?
    
    
    Brutal Tests
    ------------
    Tests
    - Tested on as many OS/hardware/library/perl as humanly possible.
    - Tested on all possible configurations of Perl.
    
    Docs
    - All .pm files should contain certain basic sections (NAME, SYNOPSIS,
      DESCRIPTION, AUTHOR, etc...)

    Plays Well With Others
    - Does it localize all magical globals?  ($/ and friends)

To make all this actually work, all the principles outlines below will
have to be brought to bear.


=head2 Karma

Its an attempt to get a rough idea of the overall wellness of an author's
work.  It works like this.  You pass a CPANTS test, you get some karma.  You
fail one, you lose some.  Someone likes your module, you get some karma. 
Doesn't like it, you lose some.  If you answer your nags (see below) you get
karma.  Let the nag list grow, you lose some.

Karma is also transferable.  If someone with high karma likes your module,
you'll gain more karma than if a person with low karma likes it.

Karma should have a half-life.  It will decay over time if you don't keep
working at it.  This balances the rating between established authors and new
authors.

Pieces of CPANTS can have karma.  If someone declares "I meant to do that" on a
test, that test will lose karma.  Too many people do that and the test's karma
will fall below a certain threshold and will be reviewed for possible
revising or removal.

Karma, like kwalitee, is a simple way of indicating the possible presense of
true quality.  Its a simple automated, non-competitive rating system with the
virtue of not being based on voting or some other for of popularity contest. 
To a certain extent it also indicates trust.  This can be used to build lists
of trusted modules (more below).


=head2 Trusted module lists

A common complaint about CPAN is that there's only one CPAN.  Being
Comprehensive and all it makes perfect sense, but there's a wanting for
multiple views on CPAN.  Karma provides an interesting method of producing
these lists.

Obviously, a list of the top N high karma modules can be produced.  Also,
lists can be given based on what tests were passed or failed.  Lists can be
produced based on modules of high karma authors.

Also, CPANTS authors and users can produce their own lists of modules they
find useful.  Other users who trust a particular person can use modules off
that list.  Such lists can be merged in various ways to produce even more
trust.

Using CPAN::Site or something similar, we can provide a subclass of the CPAN
shell which provides transparent access to a user's favorite module lists.
(Perhaps we can call it myPANTS)

=head2 Nag lists

Currently there is no centralized bug tracking of CPAN modules.  And there
probably never will be, just owing to the free-form nature of the project. 
However, the problem still remains that sometimes authors forget about bugs
and requests from users.  Lost in a sea of email and work.

So as a palletable alternative to bug tracking, CPANTS will keep a nag list on
modules.  Users can post their module bug and feature requests to CPANTS which
will forward them onto the module author.  CPANTS will remember the nags and
keep a public list.  The nags will have karmatic weight, and users can tack a
"me, too!" onto requests to add to its karma.  Nags which hang around
unanswered steadily drain the author's karma.  Nags which are answered quickly
improve it.  A module with a long list of unanswered nags will hopefully shame
an author into being more responsive.  Alternatively, it may inspire someone
else to fork off an alternative version of the module.

A long unanswered nag list for a module which hasn't had a new version in some
time will indicate a delinquent author and kick the machinery of module
takeover in motion.

Answering a nag does not necessarily mean fulfilling it.  It simply means its
been considered and either applied or rejected.  CPANTS will provide space for
the author to air their reasoning.

Of course, CPANTS will provide a "shut your pie hole" function for authors to
tell CPANTS exactly where it can put its nagging.  CPANTS will then no longer
email nags off to the author, however it will continue to collect them and
make the list public.


=head2 Kudos List

The opposite of a nag list, the kudos list contains user's praise for the
module and for specific features they like.  This will serve as a positive
feedback mechanism (too often authors only hear about their module when
there's a problem) and let them know what people like for future modules.


=head2 Reviews

These aren't like book reviews, and they're not slashdot discussions about
modules.  Its a list of mostly yes or no questions about the module for users
to answer.  It might look something like this...

    Docs
    - Are the docs complete?  Readable?
    - Are there good tutorial docs (for the casual user)?
    - Are there good reference docs (for the expert user)?
    - Are there good maintainer docs (about extending/subclassing the module)?
    - Is there a book?
    - Are the docs up to date with the code?
    
    Interface
    - Is it simple?  Does it make easy things easy and hard things possible?
    - Is it overdone (CGI.pm)?
    - Is it gratuitously OO (File::Spec)?
    - Is backwards compatibility addressed?
    
    Performance
    - Is it overall fast enough?
    - What parts are fast?
    - What parts are slow?  How can they be sped up?
    
    Tests
    - Does it all work?
    - What's broken?
    - What tests are missing?
    
    Features
    - What do you like most?
    - What do you like least?
    - How would you improve this module?
    - What's quirky?  What almost but doesn't quite work?
    - What's missing?
    - How responsive is the author to changes?

The results of these tests (weighted by the karma of the user) can be matched
up against the automated kwalitee tests to provide a check.  If they produce
contradictory results, something is wrong and the mess will be passed off to a
human for review.


=head2 Good Foo

An often popular talk at conferences is an author simply talking about their
code.  What they did, how they did it, what they were thinking and what
problems they overcame (or didn't) along the way.

CPANTS will provide a collecting point for such good foo.


=head2 CPANTS API or How To Get Into My PANTS

CPANTS will be collecting a metric fuckton of information and will not be able
to display it all as well as other can.  Accordingly, an API will be provided
for anyone to get a hold of the information.  Things like the CPAN shell and
search.cpan.org could them make use of the information as they see fit.


=head2 Testing Perl

Very often, new versions of Perl come out and our favorite CPAN
modules break.  There's always some lag time while CPAN authors
scramble to update their code.  This seriously hurts the credibility
of new perl versions and delays adaptation.

Accordingly, CPANTS can be tested on development versions of Perl.
Both to find bugs in perl and to find incompatibilities in CPAN
modules.  Prior to the release of a new stable perl, release
candidates can be run through CPANTS to see how much of CPAN will
break and give the authors early warning to be ready for the new
release.

Its possible that "how much of CPAN did we break" might become a
consideration of whether a new version of Perl is ready for release.
A list of important modules could be compiled (LWP, DBI, libnet,
etc...) and verified that they work prior to considering a release
stable.


=head1 PRINCIPLES

These are not the shores to which we are sailing, but the stars by
which we will navigate there.  Its the ideals which we'll keep in mind
while trying to make this mess work.


=head2 'Automate! ' x 3

CPAN is huge and getting huger.  The process of completely and
constantly reviewing such a huge amount of constantly growing code is
something that would swamp most any organization.  Top that off with
the thankless nature of code review and you can see why its such a
tough job.  Fortunately, we have computers and we have a healthy
abundance of laziness.

CPANTS will be written as a totally automated beastie.  Configure it,
flip a switch and it'll chug along calling in the humans as
infrequently as possible.  This ensures that the QA of CPAN will not
be subject to the schedules of a handful of people (no
disrespect meant to the current testing crew), but will have the
unblinking reliability of a computer behind it [insert grain of salt
here].

Of course, there will have to be some human interaction.  We'll follow the 80%
rule on that.  If it can handle 80% of the cases on its own, CPANTS works.

This also turns what would normally be a drab and unending QA project
into an interesting automation and limited AI project.  How do you
write a program to accurately review code submitted from all over the
world, running the whole range of computing problems, and written in
one of the most free-form languages yet invented?  (Don't worry, I
have some ideas to start us off.)


=head2 80%

There's no way in hell we can write a reviewing program that's always
going to get the concept of what is quality code right.  Not only is
there the difficulty of coding up the logic of what is "good code",
there's even the problem of nailing down what is "good code"!

To keep us from bashing our branes out against this particular rock,
the goal will be 80% accuracy.  Eight out of ten tries CPANTS has to
get it right.  The other two can be handled by humans.  Its an old
Skunk Works idea.  Perfection is expensive.  Pretty good is cheap and
fast.

80% means we can employ lots of heuristics, statistics and other icks
instead of trying to actually nail down exactly what is good code.
This should make the automation problem significantly easier.  Anything
that slips through will be handed off to a group of volunteer human
reviewers (possibly the existing cpan-testers group).

80% also means we can do the first 80% which takes 20% of the time and
junk the last 20% that takes 80% of the time.

As a result, CPANTS will err on the side of false negatives.  It would
rather fail a module and ship it off to a human than let it
accidentally pass.


=head2 Kwalitee

This is not Quality Assurance.  "Quality" is too nebulous a thing to
nail down and we won't bother.  We'll instead target something that
looks like quality, sounds like quality, but its not really quality.
Its "kwalitee", signs and heuristics which indicate the presence of
quality code without 100% confirming it.

The best analogy is cigarettes, ashtrays and cancer.  Ashtrays do not
cause cancer, cigarettes do.  However, if you glance around and see
alot of used ashtrays you can bet there's some cancer around, too.

This retargetting of CPANTS from the real target we can't really
define to one we can should make our lives alot easier while still
keeping us inside the 80% rule.  Hit what you can see.


=head2 Distribute the workload

We're going to have to review millions of lines of code on dozens of
different platforms with dozens of different OS configurations with
dozens of different Perl configurations with several different
versions of Perl.  That's alot of shit to do.  There's no way in hell
we can amass the hardware and horsepower necessary to do all this, so
we won't.  Instead we'll run CPANTS in a distributed mode.

The CPANTS client will be run on a remote machine in a little sandbox
environment and be no more difficult to install than Perl itself.
It'll run its tests on CPAN and report its results back.  The details
of this I leave specifically vague as its probably the most
challenging part of the whole mess.

Not only will this make it easier to run CPANTS over the multitude of
configurations out there, it also means that parties which have a
vested interest in making sure CPAN modules run on a particular
platform can easily volunteer the necessary hardware to run CPANTS.


=head2 Modularity and spin-offs

This is ultimately going to be a rather huge project.  In order to
both get something working soon and assure the continued sanity of the
developers CPANTS must be modular in nature.  Each piece roped off and
developed in a semi-independent manner.  Once a given piece is brought
up to some sort of working speed, it may be spun-off into an
independent project.  For instance, the code for the CPANTS
distributed client can be written in a general framework.  That
distribution framework, the hard part, can then be dealt with as a
wholely separate problem and CPANTS is merely using it.

This should serve to keep the gesalt size of the project low, meaning
how hard it is to take the whole thing in.  It will also serve to
allow specialists to home in on their particular specialties and work
on that without the distraction of unrelated noise.  Finally, the
"some program on my machine" syndrome will be averted.  That is, the
tendency for specialized programs to never quite be cleaned up for
general consumption.


=head2 Wear a human face.

Code reviews represent a tricky diplomatic problem.  Its a difficult
task to point out the failings in someone's work without pissing them
off.  If the thing doing the reviewing is a computer... nobody likes a
machine telling them what to do.

So in communicating with authors, CPANTS will make use of human
liasons to break the bad news.  A message from CPANTS will be sent to a
volunteer who will then put it into their own words and communicate with the
module author.

It would make sense if the same person talks to the same authors.  In a sense,
each author will have a case-worker.  The same face to speak to about their
module.


=head2 "Take chances, make mistakes, get messy!"

The immortal words of Miss Frizzle (maybe not everyone watches "The
Magic Schoolbus") to her students every time they wing off on another
adventure.  What it means to us is this.  If you have a good idea,
don't just talk about it, write some code, submit some patches!  Even
if the code sucks, even if the idea sucks, its better than endlessly
exchanging emails on the subject.  Nothing solidifies a discussion
like some action, even if the action is completely stupid.

This is not an invitation to begin leaping before looking, but just
that if you find the threads on the CPANTS mailing list getting a bit
too long its time to take a chance and slap down some code.  If it
doesn't work, you're no worse off than a pair of virtual skinned
knees.  No biggie.


=head2 There's an exception to every rule.

I'm sure everyone read the list of tests and noted a few that they violate
regularly with relish.  That's fine, they're just heuristics.  In such a case,
an author can declare "I meant to do that" and it will no longer be considered
a failure.  The test can also be switched off for all further versions of that
module, or even for all code by that author.  If enough people choose this
option (invoking the 80% rule) the test should be rethunk.

This also means words like 'all', 'none', 'never' and 'always' aren't quite
the hard and fast rules they seem to be.  Somebody's going to have a reason
why something doesn't apply to them.  Rather than attempting to explicitly
enumerate all these special cases, we'll just wrap them up in one meta-case
and note the specifics as they come.


=head2 Independent auditors

CPANTS will be its own thing.  While it wll obviously work with CPAN
and the Perl developers, it will be beholden to neither and act as a
wholely independent auditing service.  Accordingly, it will have no
direct effect on CPAN or Perl, only whatever influence is granted to
it.

This will hopefully keep CPANTS honest as well as avoid the problem of
appearing to dictate style to authors.


-- 

Michael G Schwern      http://www.pobox.com/~schwern/      schwern@pobox.com
Just Another Stupid Consultant                      Perl6 Kwalitee Ashuranse
BOFH excuse #65:

system needs to be rebooted



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About