develooper Front page | perl.perl5.porters | Postings from March 2013

Re: Benchmark regression 2013-02

Thread Previous
From:
Nicholas Clark
Date:
March 1, 2013 09:59
Subject:
Re: Benchmark regression 2013-02
Message ID:
20130301095854.GJ3729@plum.flirble.org
On Thu, Feb 28, 2013 at 04:59:04AM +0100, Steffen Schwigon wrote:
> Nicholas Clark <nick@ccl4.org> writes:
> > On Tue, Feb 26, 2013 at 12:28:06PM +0100, Steffen Schwigon wrote:
> >> Hi!
> >> 
> >> Short note form the benchmark front:
> >> 
> >> After weeks of consolidation function calls and method calls get slower
> >> again, for *non-threaded* (sic).
> >> 
> >>   http://speed.perlformance.net/timeline/#/?exe=36,35&base=14+80&ben=Fib&env=4&revs=50&equid=off
> >>   http://speed.perlformance.net/timeline/#/?exe=36,35&base=14+80&ben=FibOO&env=4&revs=50&equid=off
> >
> > As far as I can make out, the jump on the graph is between commits
> > 00a1356009c12c2c and c96939e471cb3942.
> > [...]
> > What compiler and linker flags are you using? I can't spot that on
> > the site.
> 
> Indeed, it's not there[1].
> See attachements for threaded/nonthread perl -V output.

I see that it's configured with pretty much defaults, such as:

    config_args='-de -Dusedevel -Dusethreads -Duse64bitall -Dprefix=/opt/perl-blead-thread-64bit-v5.17.9-14-gc96939e'

The upshot is default compiler flags:

  Compiler:
    cc='cc', ccflags ='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64',
    optimize='-O2',
    cppflags='-D_REENTRANT -D_GNU_SOURCE -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'


Sadly that means that the timing are going to have a lot of noise caused by
unrelated changes. As best I can tell (it seems to poorly documented) gcc
by default uses heuristics to decide when to use padding to get better code
alignment, versus when to avoid it because the padding would be excessive.
The upshot is that *unrelated* code changes earlier an an object file can
cause code motion in a particular function, such that the padding is done
differently, and the speed of *that* function changes. A second order effect
will be that functions may fall onto different cache lines, which could
cause timing noise.

So whilst the numbers for a default build are interesting as a general
indicator of where things are, I don't think that they are very useful
to actually identify causes of speedups or slowdowns, *or to be able to
do anything to fix them*.

So I think for starters, really you need to be explicitly telling gcc not
to use padding heuristics. I think that something like this would be better:

    -Doptimize='-O2 -falign-loops=8 -falign-jumps=8 -falign-functions=64 -falign-labels=8 -mpreferred-stack-boundary=8 -minline-all-stringops'

which I think changes all heuristics to absolutes, and tries to ensure that
all functions stay on the same alignment with L1 cache lines for each
compile.


I also (crazily) found that link order matters. No, I don't know why.
Linking the same object files (all size aligned with the L1 cache) produced
repeatable different numbers, depending on the order. I'm not *sure* what
order the GNU toolchain uses for the link, but I think that it might depend
on file timestamps. So I think really the link order in the Makefile needs
to be forced to be consistent. Something like this:

diff --git a/Makefile.SH b/Makefile.SH
index 5194ecf..412cb89 100755
--- a/Makefile.SH
+++ b/Makefile.SH
@@ -795,7 +795,7 @@ $(LIBPERL): $& $(obj) $(DYNALOADER) $(LIBPERLEXPORT)
 	true)
 		$spitshell >>$Makefile <<'!NO!SUBS!'
 	rm -f $@
-	$(LD) -o $@ $(SHRPLDFLAGS) $(obj) $(DYNALOADER) $(libs)
+	$(LD) -o $@ $(SHRPLDFLAGS) $(shell ls -1 $(obj));  $(DYNALOADER) $(libs)
 !NO!SUBS!
 		case "$osname" in
 		aix)
@@ -810,7 +810,7 @@ $(LIBPERL): $& $(obj) $(DYNALOADER) $(LIBPERLEXPORT)
 	*)
 		$spitshell >>$Makefile <<'!NO!SUBS!'
 	rm -f $(LIBPERL)
-	$(AR) rcu $(LIBPERL) $(obj) $(DYNALOADER)
+	$(AR) rcu $(LIBPERL) $(shell ls -1 $(obj)) $(DYNALOADER)
 	@$(ranlib) $(LIBPERL)
 !NO!SUBS!
 		;;


(that's gmake specific)

That would cut out a bunch of the noise, and make the results a lot more
useful for actually nailing down the true causes of speed changes.

Nicholas Clark

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About