On Sun, Dec 03, 2000 at 06:08:41PM -0600, Jarkko Hietaniemi wrote: > On Sun, Dec 03, 2000 at 11:01:39PM +0000, Simon Cozens wrote: > > On Sun, Dec 03, 2000 at 09:47:59PM +0000, Nicholas Clark wrote: > > > Well, it would be if I sent it. I'm tired and making mistakes now. > > > > Without the patch: > > u=2.31 s=0.38 cu=134.63 cs=11.73 scripts=261 tests=15242 > > > > With the patch: > > u=2.32 s=0.3 cu=135.14 cs=11.67 scripts=261 tests=15242 > > > > Was it really worth it? > > IIRC the "efficiency" comes into play in one of Nicholas' platforms > (ARM Linux?) where double math is really, *really*, slow, and staying > with integers if at all possible really pays off. for 32 bit IV I think I've got a 24 fold speed up for my benchmark of ./perl -Ilib -MBenchmark -le '$a = 4; $b=6; print timestr timeit (100000, sub {$a + $b})' without: 5 wallclock secs ( 4.18 usr + 0.00 sys = 4.18 CPU) @ 23923.44/s (n=100000) with: 0 wallclock secs ( 0.15 usr + 0.00 sys = 0.15 CPU) @ 666666.67/s (n=100000) oh. hangon. time runs backwards sometimes: -1 wallclock secs (-0.27 usr + -0.20 sys = -0.47 CPU) @ -212765.96/s (n=100000) OK. not very reliable. But I know that a floating point add machine "instruction" actually triggers an machine trap, which then calls the FP emulator code written in (more than a few) integer machine instructions, so therefore it ought to be quite a lot faster. it's actually slower with long long and 64 bit IVs here (different perl, not -DDEBUGGING) Makes we wonder how bad gcc's arm optimiser is. 64 bit perl: without: 3 wallclock secs ( 1.64 usr + -0.01 sys = 1.63 CPU) @ 61349.69/s (n=100000) with: 4 wallclock secs ( 3.38 usr + 0.05 sys = 3.43 CPU) @ 29154.52/s (n=100000) with an experimental pp_add modification: 4 wallclock secs ( 3.19 usr + 0.03 sys = 3.22 CPU) @ 31055.90/s (n=100000) > There's also the 'correctness' aspect. If NVs smear your low order > bits when you need need them you might get fussy about it. That's how I plan to get all my mostly-for-arm optimisations past Jarkko without him spotti.. damn. There's clearly quite a lot work still to do. I think it's perfectly possible to make all the pp_* stuff conditionally compile and default to the current NV implementation. (which won't make anyone's perl slower) however, all I've done so far is print out pp.c and pp_hot.c and annotate where things assume that NV preserves IV/UV But I like being able to add 9223372036854775807 and 9223372036854775807 to get 18446744073709551614 rather than 1.84467440737096e+19 Nicholas ClarkThread Previous | Thread Next