develooper Front page | perl.perl5.porters | Postings from December 2000

Re: pp_add -> pp_i_add efficiency hack?

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
December 4, 2000 16:18
Subject:
Re: pp_add -> pp_i_add efficiency hack?
Message ID:
20001205001820.B58109@plum.flirble.org
On Sun, Dec 03, 2000 at 06:08:41PM -0600, Jarkko Hietaniemi wrote:
> On Sun, Dec 03, 2000 at 11:01:39PM +0000, Simon Cozens wrote:
> > On Sun, Dec 03, 2000 at 09:47:59PM +0000, Nicholas Clark wrote:
> > > Well, it would be if I sent it. I'm tired and making mistakes now.
> > 
> > Without the patch:
> > u=2.31  s=0.38  cu=134.63  cs=11.73  scripts=261  tests=15242
> > 
> > With the patch:
> > u=2.32  s=0.3  cu=135.14  cs=11.67  scripts=261  tests=15242
> > 
> > Was it really worth it?
> 
> IIRC the "efficiency" comes into play in one of Nicholas' platforms
> (ARM Linux?) where double math is really, *really*, slow, and staying
> with integers if at all possible really pays off.

for 32 bit IV I think I've got a 24 fold speed up for my benchmark of

 ./perl -Ilib -MBenchmark -le '$a = 4; $b=6; print timestr timeit (100000, sub {$a + $b})'

without:
 5 wallclock secs ( 4.18 usr +  0.00 sys =  4.18 CPU) @ 23923.44/s (n=100000)
with:
 0 wallclock secs ( 0.15 usr +  0.00 sys =  0.15 CPU) @ 666666.67/s (n=100000)

oh. hangon. time runs backwards sometimes:
-1 wallclock secs (-0.27 usr + -0.20 sys = -0.47 CPU) @ -212765.96/s (n=100000)

OK. not very reliable. But I know that a floating point add machine
"instruction" actually triggers an machine trap, which then calls the FP
emulator code written in (more than a few) integer machine instructions,
so therefore it ought to be quite a lot faster.

it's actually slower with long long and 64 bit IVs here (different perl, not
-DDEBUGGING) Makes we wonder how bad gcc's arm optimiser is. 64 bit perl:

without:
 3 wallclock secs ( 1.64 usr + -0.01 sys =  1.63 CPU) @ 61349.69/s (n=100000)
with:
 4 wallclock secs ( 3.38 usr +  0.05 sys =  3.43 CPU) @ 29154.52/s (n=100000)
with an experimental pp_add modification:
 4 wallclock secs ( 3.19 usr +  0.03 sys =  3.22 CPU) @ 31055.90/s (n=100000)

> There's also the 'correctness' aspect.  If NVs smear your low order
> bits when you need need them you might get fussy about it.

That's how I plan to get all my mostly-for-arm optimisations past Jarkko
without him spotti.. damn.

There's clearly quite a lot work still to do. I think it's perfectly
possible to make all the pp_* stuff conditionally compile and default to the
current NV implementation. (which won't make anyone's perl slower)
however, all I've done so far is print out pp.c and pp_hot.c and annotate
where things assume that NV preserves IV/UV

But I like being able to add 9223372036854775807 and 9223372036854775807
to get 18446744073709551614 rather than 1.84467440737096e+19

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About