develooper Front page | perl.perl5.porters | Postings from September 2012

Re: [perl.git] branch smoke-me/stack_optimization, created.v5.17.3-295-g4ab7cbc

Thread Next
From:
Nicholas Clark
Date:
September 14, 2012 09:23
Subject:
Re: [perl.git] branch smoke-me/stack_optimization, created.v5.17.3-295-g4ab7cbc
Message ID:
20120914162330.GC22938@plum.flirble.org
tl;dr - it's the I cache, not the D cache. Not what I'd expect

On Wed, Sep 12, 2012 at 04:12:44PM +0200, Steffen Mueller wrote:
> In perl.git, the branch smoke-me/stack_optimization has been created

> commit 4ab7cbc880156236df39ea8163d6778ae1c123a1
> Author: Steffen Mueller <smueller@cpan.org>
> Date:   Wed Sep 12 14:52:46 2012 +0200
> 
>     Save one NULL assignment per TMP
>     
>     This assignment looks really rather like overzealous cleanliness.
>     It's a hot path. Now it's death by 999 cuts instead of 1000.

diff --git a/scope.c b/scope.c
index acd04e7..1bcddde 100644
--- a/scope.c
+++ b/scope.c
@@ -160,8 +160,7 @@ Perl_free_tmps(pTHX)
     /* XXX should tmps_floor live in cxstack? */
     const I32 myfloor = PL_tmps_floor;
     while (PL_tmps_ix > myfloor) {      /* clean up after last statement */
-       SV* const sv = PL_tmps_stack[PL_tmps_ix];
-       PL_tmps_stack[PL_tmps_ix--] = NULL;
+       SV* const sv = PL_tmps_stack[PL_tmps_ix--];
        if (sv && sv != &PL_sv_undef) {
            SvTEMP_off(sv);
            SvREFCNT_dec(sv);           /* note, can modify tmps_ix!!! */


So I tried seeing what the effect of this was (on mktables)

The parent commit, as seen by cachegrind and timed by dumbbench:

==1785== I   refs:      57,650,456,673
==1785== I1  misses:       923,951,685
==1785== LLi misses:            30,848
==1785== I1  miss rate:           1.60%
==1785== LLi miss rate:           0.00%
==1785==
==1785== D   refs:      28,806,382,638  (19,025,390,732 rd   + 9,780,991,906 wr)
==1785== D1  misses:       714,687,811  (   663,333,054 rd   +    51,354,757 wr)
==1785== LLd misses:        12,956,796  (    10,990,839 rd   +     1,965,957 wr)
==1785== D1  miss rate:            2.4% (           3.4%     +           0.5%  )
==1785== LLd miss rate:            0.0% (           0.0%     +           0.0%  )
==1785==
==1785== LL refs:        1,638,639,496  ( 1,587,284,739 rd   +    51,354,757 wr)
==1785== LL misses:         12,987,644  (    11,021,687 rd   +     1,965,957 wr)
==1785== LL miss rate:             0.0% (           0.0%     +           0.0%  )

cmd: Ran 24 iterations (4 outliers).
cmd: Rounded run time per iteration: 1.8338e+01 +/- 1.4e-02 (0.1%)


and with your change

==1077== 
==1077== I   refs:      57,630,786,060
==1077== I1  misses:       915,948,264
==1077== LLi misses:            30,898
==1077== I1  miss rate:           1.58%
==1077== LLi miss rate:           0.00%
==1077== 
==1077== D   refs:      28,800,384,564  (19,033,792,095 rd   + 9,766,592,469 wr)
==1077== D1  misses:       714,682,003  (   663,327,113 rd   +    51,354,890 wr)
==1077== LLd misses:        12,956,749  (    10,990,840 rd   +     1,965,909 wr)
==1077== D1  miss rate:            2.4% (           3.4%     +           0.5%  )
==1077== LLd miss rate:            0.0% (           0.0%     +           0.0%  )
==1077== 
==1077== LL refs:        1,630,630,267  ( 1,579,275,377 rd   +    51,354,890 wr)
==1077== LL misses:         12,987,647  (    11,021,738 rd   +     1,965,909 wr)
==1077== LL miss rate:             0.0% (           0.0%     +           0.0%  )

cmd: Ran 28 iterations (8 outliers).
cmd: Rounded run time per iteration: 1.82834e+01 +/- 6.1e-03 (0.0%)


So it is faster. What's curious is that it seems to be faster because it
runs fewer instructions (and specifically, has fewer I1 misses), not because
it actually causes fewer write cache misses. Which is quite a surprise - I
would have thought that it would have been faster by reducing pressure on the
write cache.

Nicholas Clark

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About