Front page | perl.perl5.porters |
Postings from August 2016
Re: Interpreting cachegrind output
Thread Previous
|
Thread Next
From:
Dave Mitchell
Date:
August 5, 2016 13:13
Subject:
Re: Interpreting cachegrind output
Message ID:
20160805131327.GE2632@iabyn.com
On Fri, Aug 05, 2016 at 05:02:10AM -0000, Father Chrysostomos wrote:
> I have a branch that reduces the number of gv lookups for
> qq"@array" from 3 to 2, if I counted them correctly (the tip of the
> sprout/pitlookup branch).
>
> Simple benchmarking does not give me any observable results. So I
> tried cachegrind, and I see that most numbers are lower in the 'After'
> results, but some are not. I do not remember what any of this means.
>
> So, is it worth making this change?
You'd be far better off using Porting/bench.pl for this sort of thing.
It uses cachegrind under the hood, but does multiple runs (in parallel)
and subtracts out overheads, reproducibly giving the exact numbers just
for the actual code in question. For example, create a file like :
$ cat /tmp/benchmarks
[
'foo' => {
desc => 'XXX',
setup => '@_ = ()',
code => '() = @_',
},
];
Then run the tests (in this example against 2 perl binaries), writing the
results to a file:
$ perl5240 Porting/bench.pl -w /tmp/results -j 8 --benchfile=/tmp/benchmarks perl5220o perl5240o
Then display the results using the raw cacherind numbers:
$ perl5240 Porting/bench.pl -r /tmp/results --raw
Key:
Ir Instruction read
Dr Data read
Dw Data write
COND conditional branches
IND indirect branches
_m branch predict miss
_m1 level 1 cache miss
_mm last cache (e.g. L3) miss
- indeterminate percentage (e.g. 1/0)
The numbers represent raw counts per loop iteration.
foo
XXX
perl5220o perl5240o
--------- ---------
Ir 257.0 240.0
Dr 92.0 87.0
Dw 53.0 54.0
COND 36.0 33.0
IND 5.0 5.0
COND_m 2.0 0.0
IND_m 5.0 5.0
Ir_m1 0.0 0.0
Dr_m1 0.0 0.0
Dw_m1 0.0 0.0
Ir_mm 0.0 0.0
Dr_mm 0.0 0.0
Dw_mm 0.0 0.0
$
This shows that ()=@_ is slightly faster in 5.24.0 compared with 5.22.0:
slightly fewer instruction and data reads, one more data write, fewer
conditional branches, the same number of indirect branches (switches,
calling function via a pointer etc), less conditional branch predict
misses, the same number of indirect branch predict misses, and no
instruction or data cache misses (which is to be expected for a small code
snippet like this).
--
I before E. Except when it isn't.
Thread Previous
|
Thread Next