Front page | perl.perl6.internals |
Postings from July 2001
Re: Opcode Dispatch
Thread Previous
|
Thread Next
From:
Nick Ing-Simmons
Date:
July 30, 2001 06:35
Subject:
Re: Opcode Dispatch
Message ID:
20010730133452.11518.4@bactrian.ni-s.u-net.com
Bryan C . Warnock <bwarnock@capita.com> writes:
>Not that this is the most scientific testing in the world, but I did write a
>couple variations of an opcode dispatch loop just to see how they compare.
>
>Of course, I violated rule number one of writing fast code - I didn't run it
>on a slow machine. Shame on me.
Your options suggest GCC - but which CPU?
The lookup/switch trade off may well be different on RISC vs CISC.
It would be entertaining to see -S output for two cases -
I would expect a large-ish switch to become a lookup in the generated
code - unless the 'case' constants made a compare tree elegant.
>
>Given an 80K opcode stream consisting of 25 opcodes run through 4,000 times,
>with a terminating 26th opcode, and with 26 individual opcode dispatch
>functions. Too small a dataset, but I wanted to get a feel for it.
>
>Each of the functions took the opcode pointer and a data pointer. One
>function handled either the buffer reset, or the program exit. The
>remainder did some simple arithmetic on the addresses of both local and
>argument variables, and added (or subtracted) that to some global variable,
>which was used at exit. (In an effort to keep the busy work from being
>optimized out.)
>
>The two major variations were the Big Switch, and a function pointer table
>lookup.
>
>int main () {
> char *buf = "....";
> char *opcode = buf;
> char *(*fn[])(char *,int *) = {
> opcodefA, ... , opcodefZ }; /* See why we love Perl? */
> int x;
>
> do {
>
>Here's the switch, basically.
>
> switch (*opcode) {
> case 'A': opcode = opcodefA(opcode, &x); break;
> ...
> case 'Z': opcode = opcodefZ(opcode, &x); break;
> default: break;
> }
>
>and here's the lookup:
>
> opcode = fn[(int)(*opcode - 'A')](opcode,&x);
>
>I then ran each as compiled with -g, and then with -O2. I ran each program
>six times, throwing away the first results, and averaging the remaining
>five. For both cases, the optimized version ran about twice as fast as the
>debuggable version. The lookup was slightly faster for unoptimized code,
>while the switch was quicker after optimizations.
>
>lookup: 8.9u 15.7u
>switch: 7.65u 16.3u
>
>I then chose an opcode to make a noop. For the switch, that meant changing
>a case to be a simple opcode increment, but for the lookup, that involved
>changing the opcode function to do nothing except return the incremented
>pointer. So a noop in the lookup would still have the overhead of a
>function call (unless the compiler was clever enough to optimize it away
>through the function pointer). The lookup actually ran slightly slower when
>optimized, while the switch sped up a little.
>
>lookup: 9.1u 15.6u
>switch: 7.25u 16.0u
>
>I then added an array bounds check on the lookup (since I didn't want to be
>executing arbitrary code with an invalid opcode, such as from a future
>version of Perl. Which reminds me, we need a version opcode.) For the
>switch, that was handled by the default case. For both, the result was a
>noop. It had no discernable affect on either loop.
>
>lookup: 9.1u
>switch: 7.25u
>
>Some additional things to consider:
>
>I wasn't checking the opcode pointer, assuming I always had a good one. (I
>made sure I always had a good one by appending the termination opcode. I
>don't know if we can get away with this in real life.) With real opcodes,
>you also won't need to do arithmetic on the function table index. (However,
>when I padded the function table and removed it, I didn't see any noticeable
>change, so I left it alone.)
>
>If we're going to be allowing replaceable opcodes, or allow pre- and post-
>opcode function functions, that may shift the focus a little. If you're
>replacing the functions, having a lookup table would allow you to bubble the
>new function right to the top, whereas the switch requires the default
>opcode handler to call the function for you. Furthermore, even with the
>switch, you're going to handle pre- and post- functions with a lookup
>anyway, but then so is the lookup, so I guess they cancel out. The
>advantage the switch has, on the other hand, is that not all opcodes should
>be pluggable. (Since we're designing this as a software CPU, process
>control opcodes, such as jumps, program termination, etc. shouldn't be
>allowed to be overridden.) Those, if they are simple enough, can then be
>inlined into the switch, much like the noop, and save some overhead on
>function calls.
>
>I haven't gotten events plugged in yet, but the event loop, if we're still
>planning for a two-loop architecture, should go under similar design
>considerations. However, events won't (we hope) occur as often as opcodes
>do, and will probably be overridden more than opcodes. In that particular
>case, we could very well have a mix-and-match situation, where a switched
>opcode dispatch is faster under normal situations, but a lookup dispatch
>table is faster for the events normally. Dunno. Also, to bury a question
>way down here where it won't be seen, are we having a fixed number of event
>priorities, or is it unbounded?
>
>Yes, I'm well aware of what Knuth says about premature optimizations.
>Luckily, it's fairly trivial to switch (no pun intended) between the two, if
>that they are both setup for that. (I'm not sure I could switch Perl 5 from
>one to the other, for instance.) And I'm sure there are some things that
>I'm not considering. Nick and Dan may have some additional insight.
--
Nick Ing-Simmons
http://www.ni-s.u-net.com/
Thread Previous
|
Thread Next