develooper Front page | perl.perl5.porters | Postings from August 2016

Re: Interpreting cachegrind output

Thread Previous | Thread Next
Dave Mitchell
August 5, 2016 13:13
Re: Interpreting cachegrind output
Message ID:
On Fri, Aug 05, 2016 at 05:02:10AM -0000, Father Chrysostomos wrote:
> I have a branch that reduces the number of gv lookups for
> qq"@array" from 3 to 2, if I counted them correctly (the tip of the
> sprout/pitlookup branch).
> Simple benchmarking does not give me any observable results.  So I
> tried cachegrind, and I see that most numbers are lower in the 'After'
> results, but some are not.  I do not remember what any of this means.
> So, is it worth making this change?

You'd be far better off using Porting/ for this sort of thing.
It uses cachegrind under the hood, but does multiple runs (in parallel)
and subtracts out overheads, reproducibly giving the exact numbers just
for the actual code in question.  For example, create a file like :

$ cat /tmp/benchmarks 
    'foo' => {
        desc   => 'XXX',
        setup   => '@_ = ()',
        code    => '() = @_',

Then run the tests (in this example against 2 perl binaries), writing the
results to a file:

    $ perl5240 Porting/ -w /tmp/results -j 8 --benchfile=/tmp/benchmarks perl5220o perl5240o

Then display the results using the raw cacherind numbers:

    $ perl5240 Porting/ -r /tmp/results --raw
        Ir   Instruction read
        Dr   Data read
        Dw   Data write
        COND conditional branches
        IND  indirect branches
        _m   branch predict miss
        _m1  level 1 cache miss
        _mm  last cache (e.g. L3) miss
        -    indeterminate percentage (e.g. 1/0)

    The numbers represent raw counts per loop iteration.


           perl5220o perl5240o
           --------- ---------
        Ir     257.0     240.0
        Dr      92.0      87.0
        Dw      53.0      54.0
      COND      36.0      33.0
       IND       5.0       5.0

    COND_m       2.0       0.0
     IND_m       5.0       5.0

     Ir_m1       0.0       0.0
     Dr_m1       0.0       0.0
     Dw_m1       0.0       0.0

     Ir_mm       0.0       0.0
     Dr_mm       0.0       0.0
     Dw_mm       0.0       0.0

This shows that ()=@_ is slightly faster in 5.24.0 compared with 5.22.0:
slightly fewer instruction and data reads,  one more data write, fewer
conditional branches, the same number of indirect branches (switches,
calling function via a pointer etc), less conditional branch predict
misses, the same number of indirect branch predict misses, and no
instruction or data cache misses (which is to be expected for a small code
snippet like this).

I before E. Except when it isn't.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About