On Sun, Jan 12, 2003 at 10:24:23AM +0100, Leopold Toetsch wrote: > In perl.perl6.internals, you wrote: > > --- Leopold Toetsch <lt@toetsch.at> wrote: > >> * SLOW (same slow with register or odd aligned) > >> * 0x818118a <jit_func+194>: sub 0x8164cac,%ebx > >> * 0x8181190 <jit_func+200>: jne 0x818118a <jit_func+194> > > > The slow one has the loop crossing over a 16 byte boundary. Try moving it > > over a bit. > > Yep, actually it looks like a 8 byte boundary: > Following program: > And here is the output: > > 0 790.826400 M op/s > 1 523.305494 M op/s > 2 788.544190 M op/s > 3 783.447189 M op/s > 4 783.975462 M op/s > 5 788.208178 M op/s > 6 782.466484 M op/s > 7 788.059343 M op/s > 8 788.836349 M op/s > 9 522.986581 M op/s > 10 788.895326 M op/s > 11 784.021624 M op/s > 12 789.773978 M op/s > 13 788.065635 M op/s > 14 783.558056 M op/s > 15 789.010709 M op/s > 16 782.463565 M op/s > 17 523.049517 M op/s > 18 781.350657 M op/s etc > This of course has the assumption, that the program did run at the > same address, which is - from my experience with gdb - usually true. > > So moving the critical part of a program by just one byte can cause a > huge slowdown. I don't think that I ever mailed what seemed to be the answer back to p5p or p6i. Thanks to Leo's suggestions I went hunting in the gcc man pages. 2.95 and 3.0 are quite informative. -falign-functions -falign-labels -falign-loops -falign-jumps all default to a machine dependent default. This default isn't documented explicitly, but I presume that on x86 it's the same as the x86 specific -m options of the same name (deprecated in gcc 3.0, removed along with their documentation by 3.2) *Their* alignment defaults are: `-malign-loops=NUM' Align loops to a 2 raised to a NUM byte boundary. If `-malign-loops' is not specified, the default is 2 unless gas 2.8 (or later) is being used in which case the default is to align the loop on a 16 byte boundary if it is less than 8 bytes away. sooooo 50% of the time your function/label/loop/jump is 16 byte aligned. 50% of the time your function/label/loop/jump is "randomly" aligned So, a slight code size change early on in a file can cause the remaining functions to ping either onto, or off alignment. Hence later loops in completely unrelated code can happen to become optimally aligned, and go faster. And similarly other loops which were optimally aligned will now go unaligned, and go more slowly. This is probably the right default for the general case, but it is counterproductive for benchmarking small code changes. So on gcc 2.95 I'm compiling with: -O -malign-loops=3 -malign-jumps=3 -malign-functions=3 -mpreferred-stack-boundary=3 -march=i686 (thats 2**3, ie 8) and on gcc 3.2 on a different machine: -O3 -falign-loops=16 -falign-jumps=16 -falign-functions=16 -mpreferred-stack-boundary=3 -march=i586 This seems to smooth out the jumps. In the end copy on write regexps are on average 0% faster on the fast PIII machine with gcc 2.95, and about 2% faster on the slower Cyrix with gcc 3.2 Based on what perlbench thinks. Nicholas ClarkThread Previous | Thread Next