develooper Front page | perl.perl5.porters | Postings from October 2014

proposal for performance testing infrastructure

Thread Next
Dave Mitchell
October 14, 2014 20:00
proposal for performance testing infrastructure
Message ID:
I want to discuss some modest proposals for better testing of performance
and optimisations. The first four suggestions are simple, concrete, and (I
hope) non-controversial. The fifth is more woolly and up for discussion.

First, I propose adding a new test subdirectory, t/perf/ say,
specifically to hold tests related to performance and optimisations.

There's already a directory t/benchmark/, holding a single test file
rt26188-speed-up-keys-on-empty-hash.t. I would move this file to t/perf/.
I don't want to use the name t/benchmark since what I'm proposing covers
more things than just benchmarking.

Secondly, add a new test file, t/perf/speed.t, say, which does something
similar to my recently added t/re/speed.t file, namely that it will run
chunks of (non-regex) code that are known to be dramatically slower in the
absence of a particular optimisation. For example the following:

    $x = "x" x 1_000_000;
    $y = $x for 1..1_000_000;

takes < 1 sec normally, but takes many minutes if COW is disabled.
The new test file (like t/re/speed.t) will run with a watchdog timeout.
This rather crude approach gives us a chance of a test file actually
failing if we accidentally break an important optimisation. Even if it
doesn't time out, it might be noticed that it's suddenly taking a lot
longer to run.

Thirdly, some optimisations involve changes to the optree; for example
array access with a small constant index is done with the aelem op being
replaced with an aelemfast op. Currently we typically test this in
ext/B/t/optree_concise.t and similar, by comparing the output of Concise
with the expected output. This isn't good for three reasons:

1) the test is in the wrong place; ext/B/t/ is somewhere to test B and
2) its very sensitive/fragile; it's comparing the complete op-tree in
Concise format, including every op flag etc; a simple change unrelated to
the optimisation (e.g. adding a new private flag to an op) requires fixups
to all the templates;
3) it requires two separate Concise output templates, one for unthreaded
and one for threaded.

Often all we are interested in is whether the peephole optimiser has
successfully replaced one op (aelem) with another (aelemfast).

I propose a new test file, t/perf/opcount.t, that has a test function
where you pass it a sub ref and a hash of expected op counts. The function
uses B to walk the optree of the sub, generating a count of each op type.
Then a test for aelemfast might look something like:

    opcount_test(code   => sub { $a[0] = 1 },
                 counts => { aelemfast => 1, aelem => 0, 'ex-aelem' => 1 });

so we just specify the expected counts of the ops we're interested in. We
don't care what the rest of the ops in the optree are up to, nor what
flags they have.

Fourthly, we currently abuse ext/Devel-Peek/t/Peek.t (in a similar manner
to ext/B/t/optree_concise.t) when we want to check SVs for having
particular flags set etc. For similar reasons, I propose a new test file,
t/perf/peek.t say, that will call Dump on an SV of interest, and return a
hash of 'key = value' pairs; for example 

    SV = PVAV(0x18a0208) at 0x18c9970
      REFCNT = 1
      FLAGS = ()
      ARRAY = 0x0
      FILL = -1
      MAX = -1
      ARYLEN = 0x0
      FLAGS = (REAL)

might be returned as the simple hash

        SV     => 'PVAV(0x18a0208) at 0x18c9970',
        REFCNT => '1',
        FLAGS  => '()',
        ARRAY  => '0x0',
        FILL   => '-1',
        MAX    => '-1',
        ARYLEN => '0x0',
        FLAGS  => '(REAL)',

it would then be relatively easy to check for specific things, rather than
comparing the whole output; e.g.

    cmp_ok($peek->{COW_REFCNT}, '>=', 1, "the var is shared");

Fifthly, (and here I start to get very woolly), I'd like a file that
contains an accumulated collection of code snippets; in a sense it would
be a bit like all the little benchmark files in the 'perlbench' CPAN
distribution concatted into a single file. The file might look something

    %benchmarks = (
        zero_arg_func =>

                desc     => 'zero arg function call',
                preamble => 'sub f{}',
                code     => 'f()',
        one_arg_func =>
                desc => ...

but rather than being a collection of hypothetical benchmarks like that
found in perlbench, they would be an accumulation of things we think we have
optimised. For example with the padrange optimisation, I might claim that
'my ($x,$x,$z) = @_' should be faster. So the commit that adds padrange
would also add a few entries to this file with those sorts of code
constructs in them.

There would then be separate tools that can read in this file, and compile
and run all, or *a specific* test, with suitable looping. The first tool
just would be a .t file that runs all the tests and makes sure they
compile and execute okay (so we don't break things).

Next, there might be a script (not a .t file) that runs the each test in a
loop and outputs some timings. It would be capable of running against
multiple perls It would be the rough equivalent of perlbench.

But what I would be most interested in is seeing is a tool that runs each
test under cachegrind. For those of you not familiar with it, cachegrind
is part of the valgrind tool suite; it runs an x86 executable in a sort of
simulation that allows it to calculate how many instruction and data reads
and writes were done, how many branches were followed, and how many L1 and
Ln cache misses there were. It gives an overall total for the executable,
and can also be asked to provide a per-line source code annotation. The
great thing about it is that it gives a reproducible analysis of a code
run, rather than relying on a statistic analysis of repeated run timings,
so it can reliably show small changes.

Its not perfect; for example while it will detect a cache miss, it won't
tell you whether a real CPU would have initiated a pre-fetch a bit
earlier, so in the real world, changes to the code might make the miss
only stall for 1/3 the time it might otherwise have, say.

I would envisage that this tool would execute each test twice as a
separate executable run, with a 1-loop and an N-loop, then do the sums to
calculate the totals per iteration. It could then do this with multiple
perl versions and display how things are changing (e.g. the newer perl had
97% of the instruction reads and 106% of the data writes as compared with
the first perl). It could do this for all the tests or just for specified

"Foul and greedy Dwarf - you have eaten the last candle."
    -- "Hordes of the Things", BBC Radio.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About