In perl.perl6.internals, you wrote: > --- Leopold Toetsch <lt@toetsch.at> wrote: >> * SLOW (same slow with register or odd aligned) >> * 0x818118a <jit_func+194>: sub 0x8164cac,%ebx >> * 0x8181190 <jit_func+200>: jne 0x818118a <jit_func+194> > The slow one has the loop crossing over a 16 byte boundary. Try moving it > over a bit. Yep, actually it looks like a 8 byte boundary: Following program: #!/usr/bin/perl -w use strict; for (my $i = 0; $i < 100; $i++) { printf "%3d\t", $i; open(P, ">m.pasm"); for (0..$i) { print(P <<'ENOP'); noop ENOP } print(P <<'EOF'); set I3, 1 set I4, 100000000 set I5, I4 time N1 REDO: sub I4, I4, I3 if I4, REDO time N5 sub N2, N5, N1 set N1, I5 mul N1, 2 div N1, N2 set N2, 1000000.0 div N1, N2 print N1 print " M op/s\n" end EOF close(P); system("perl assemble.pl m.pasm | parrot -j -"); } And here is the output: 0 790.826400 M op/s 1 523.305494 M op/s 2 788.544190 M op/s 3 783.447189 M op/s 4 783.975462 M op/s 5 788.208178 M op/s 6 782.466484 M op/s 7 788.059343 M op/s 8 788.836349 M op/s 9 522.986581 M op/s 10 788.895326 M op/s 11 784.021624 M op/s 12 789.773978 M op/s 13 788.065635 M op/s 14 783.558056 M op/s 15 789.010709 M op/s 16 782.463565 M op/s 17 523.049517 M op/s 18 781.350657 M op/s 19 784.184698 M op/s 20 789.683646 M op/s 21 781.362666 M op/s 22 783.994146 M op/s 23 789.100887 M op/s 24 783.990848 M op/s 25 370.620840 M op/s 26 786.862561 M op/s 27 784.092342 M op/s 28 789.106826 M op/s 29 784.027852 M op/s 30 780.688935 M op/s 31 787.913154 M op/s 32 783.576354 M op/s 33 526.877272 M op/s 34 780.493905 M op/s 35 790.339116 M op/s 36 789.166586 M op/s 37 782.154592 M op/s 38 786.902789 M op/s 39 783.834446 M op/s 40 784.003305 M op/s 41 522.135984 M op/s 42 780.618829 M op/s 43 790.167145 M op/s 44 783.284786 M op/s 45 790.363689 M op/s 46 781.002931 M op/s 47 783.720572 M op/s 48 789.774350 M op/s 49 523.933363 M op/s 50 786.970706 M op/s 51 780.966576 M op/s 52 789.234894 M op/s 53 784.317040 M op/s 54 780.993842 M op/s 55 789.914164 M op/s 56 783.705196 M op/s 57 291.958023 M op/s 58 783.653215 M op/s 59 788.739927 M op/s 60 784.599837 M op/s 61 783.917218 M op/s 62 790.051795 M op/s 63 782.589121 M op/s 64 784.846120 M op/s 65 523.988181 M op/s 66 788.746231 M op/s 67 781.811980 M op/s 68 786.188159 M op/s 69 790.023521 M op/s 70 783.149502 M op/s 71 786.531300 M op/s 72 781.711076 M op/s 73 527.106372 M op/s 74 783.735948 M op/s 75 788.491194 M op/s 76 782.442035 M op/s 77 780.387170 M op/s 78 789.259770 M op/s 79 779.781801 M op/s 80 788.186701 M op/s 81 523.328673 M op/s 82 790.407627 M op/s 83 782.751235 M op/s 84 788.410417 M op/s 85 782.625627 M op/s 86 782.056516 M op/s 87 787.631292 M op/s 88 782.218409 M op/s 89 425.664145 M op/s 90 778.734333 M op/s 91 787.851363 M op/s 92 784.661485 M op/s 93 788.292247 M op/s 94 783.754621 M op/s 95 789.181805 M op/s 96 788.326694 M op/s 97 523.357568 M op/s 98 782.105369 M op/s 99 781.796679 M op/s This of course has the assumption, that the program did run at the same address, which is - from my experience with gdb - usually true. So moving the critical part of a program by just one byte can cause a huge slowdown. (This is an Athlon 800, i386/linux) leoThread Previous | Thread Next