develooper Front page | perl.perl5.porters | Postings from July 2013

Re: [perl.git] branch blead, updated. v5.19.0-497-g1ebabb4

Thread Previous | Thread Next
Dave Mitchell
July 14, 2013 19:31
Re: [perl.git] branch blead, updated. v5.19.0-497-g1ebabb4
Message ID:
On Mon, Jun 10, 2013 at 06:23:27PM +0100, Dave Mitchell wrote:
> On Mon, Jun 10, 2013 at 12:33:30PM -0400, George Greer wrote:
> > At least for my Linux smokers, Benchmark.t test 15 fails not because
> > the clocks are inconsistent but because the load average can reach
> > over 100 at times depending on how many different smokes are
> > running, especially since they almost all do "make -j" and
> > TEST_JOBS=8. (The AddressSanitizer smokers don't use "make -j"
> > because I kernel panic'd my machine once doing that.)  My machine
> > has 4 physical, 8 virtual cores so it doesn't blow the timings too
> > frequently but it does happen.
> > 
> > A check for load average being "too high" after a failure to silence
> > it might help at least in my cases.
> Well, the test is failing fundamentally for one of two reasons: either
> the amount of CPU burned (as reported by the OS as (times)[0]) for a given
> amount of work is not constant, or there's a bug in Benchmark somewhere.
> If the former, whether its caused by high load or some other reason, I'm
> hoping my new change will detect this.

(TL;DR: I'm proposing to remove the timing sensitive test 15 (lately tests
7,16) from Benchmark.t, unless anyone can suggest otherwise.)

Ok, after a month of smokes with the new calibration test in, I'm
satisfied that it isn't an issue with Benchmark.

The temporary calibration test does roughly the following (without making
use of any facilities):

1. Calculate N: a rough estimate of how many times a loop must be run to
burn 1 sec of CPU.
2. Run the loop N times, noting the before/after user CPU using times(),
and thus calculate exactly the CPU seconds required to loop N times;
3. Run the loop 3*N times, again noting the before/after user CPU
and again calculating loops per CPU second.
4. If the loops-per-sec determined in steps 2 and 3 differ by more
than 28%, fail the temporary calibration test (test 7), and skip test 16.

The original Benchmark test (usually test 15, recently test 16) does
conceptually the same thing, but using library routines in
This time the margin for error is 40% (more tolerant than the calibration

In the last month or so, smokes have failed the calibration test approx
160 times, while the test has failed 100 times.

From this I conclude that it's not an issue with Benchmark, but an issue
with the test file expecting that the OS will return a reasonably
consistent CPU-seconds burn for a given task.

Having said that, I've been completely unable to reproduce this at home,
even on a server I ramped up to a load average of 300+. I tried having the
high load only for the first half of the calibration loop, only on the
second, or on both. Over many runs, I never saw more than a few %
deviation between the two loops-per-sec values.

But given that the smokes have failed on the platforms run by at least 3
different volunteers, over multiple platforms (linux, darwin, win32, aix
etc) I now propose to remove the infamous test 15 from Benchmark.t

After that I'll look at failures 128,129 and see if the same thing applies
to them.

You never really learn to swear until you learn to drive.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About