develooper Front page | perl.perl5.porters | Postings from September 2011

Update on the CPAN regression smoke

From:
Steffen Mueller
Date:
September 12, 2011 12:49
Subject:
Update on the CPAN regression smoke
Message ID:
4E6E622D.1000805@cpan.org
Hi Chip, hi all,

here's a quick update on the current state of the restarted CPAN 
regression smoke for chip's magicflags1 branch.

For those new to the party: I'm smoking a full copy of the CPAN for two 
branches of perl: Chip's "magicflags1" branch and the blead perl commit 
that is at the base of that branch (uncreatively called 
"beforemagicflags1").

I've been running between two and five worker processes per perl. The 
workload is divided among the workers by dividing a static input list of 
distributions into $N equal chunks. I've had to kill the smokers once or 
twice and restart them. I'll get to the significance of that below.

The distributions for whom both sets of smoker have delivered a result 
and for whom the results differ are shown in the comparison report at

   http://steffen-mueller.net/tmp/out.

As you can see, a good chunk of CPAN is done already, with another 
couple of thousand distributions to churn though. The seriously hacky 
code that implements the smokers can be found at

   http://github.com/tsee/cpan_perl_branch_smoke.

Specifically, for those who would like to give it a shot, there are 
step-by-step instructions at

   https://raw.github.com/tsee/cpan_perl_branch_smoke/master/README

Now, there are still several issues. The most glaring problem right now 
is that when running smokers in parallel, they are going to have to 
duplicate a lot of the work. This is because the underlying smoker code 
rightfully never installs into the target perl. Instead, it keeps the 
build directories for all distributions around. These builds can then be 
used for the @INC of the distribution which is currently being tested. 
If you divide the list of input distributions into $N equal parts and 
run $N independent smokers, they will re-do commonly-depended on 
distributions $N times. I currently can't see a very good way of working 
around this without mucking with how CPANPLUS works a lot. Maybe Chris 
has an idea? This is a serious performance issue.

This has other fallout. Since the script sets up a clean smoking 
environment for each smoker (tempdir with HOME=tempdir, .cpanplus/etc 
config dirs copied to the tempdir), restarting the smokers will make 
them rebuild all the commonly depended-on distributions. This could 
probably be worked around by allowing explicit specification of the work 
directory.

Furthermore, since CPANPLUS keeps all those build directories around, 
the smokers tend to require a lot of disk space and that scales with the 
number of smoker processes.

Finally, the whole process still requires manual punching of several 
longish (but simple) commands into a shell. No big deal, but considering 
the enthusiasm for this kind of check, that's probably not going to be 
sufficient. I can't think of a technical reason why one couldn't 
automate this to the point of just specifying two SHA1s and waiting for 
an email that points you at the resulting regression report. I'd be more 
than happy to be beaten to doing this.

Best regards,
Steffen

PS: I'll send a final update when the smokers have completely finished. 
Two chunks out of five of one smoker have already stopped. I hope the 
rest will follow "soon".



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About