develooper Front page | perl.perl6.internals | Postings from July 2001

Re: Opcode Dispatch

Thread Previous | Thread Next
Nick Ing-Simmons
July 30, 2001 06:35
Re: Opcode Dispatch
Message ID:
Bryan C . Warnock <> writes:
>Not that this is the most scientific testing in the world, but I did write a 
>couple variations of an opcode dispatch loop just to see how they compare.
>Of course, I violated rule number one of writing fast code - I didn't run it 
>on a slow machine.  Shame on me.

Your options suggest GCC - but which CPU?

The lookup/switch trade off may well be different on RISC vs CISC.
It would be entertaining to see -S output for two cases - 
I would expect a large-ish switch to become a lookup in the generated 
code - unless the 'case' constants made a compare tree elegant.

>Given an 80K opcode stream consisting of 25 opcodes run through 4,000 times, 
>with a terminating 26th opcode, and with 26 individual opcode dispatch 
>functions.  Too small a dataset, but I wanted to get a feel for it.
>Each of the functions took the opcode pointer and a data pointer.  One 
>function handled either the buffer reset, or the program exit.  The 
>remainder did some simple arithmetic on the addresses of both local and 
>argument variables, and added (or subtracted) that to some global variable, 
>which was used at exit.  (In an effort to keep the busy work from being 
>optimized out.)
>The two major variations were the Big Switch, and a function pointer table 
>int main () {
>   char *buf = "....";
>   char *opcode = buf;
>   char *(*fn[])(char *,int *) = {
>      opcodefA, ... , opcodefZ };  /* See why we love Perl? */
>   int x;
>   do {
>Here's the switch, basically.
>      switch (*opcode) {
>         case 'A': opcode = opcodefA(opcode, &x); break;
>                                     ...
>         case 'Z': opcode = opcodefZ(opcode, &x); break;
>         default: break;
>      }
>and here's the lookup:
>      opcode = fn[(int)(*opcode - 'A')](opcode,&x);
>I then ran each as compiled with -g, and then with -O2.  I ran each program 
>six times, throwing away the first results, and averaging the remaining 
>five.  For both cases, the optimized version ran about twice as fast as the 
>debuggable version.  The lookup was slightly faster for unoptimized code, 
>while the switch was quicker after optimizations.
>lookup: 8.9u   15.7u
>switch: 7.65u   16.3u
>I then chose an opcode to make a noop.  For the switch, that meant changing 
>a case to be a simple opcode increment, but for the lookup, that involved 
>changing the opcode function to do nothing except return the incremented 
>pointer.  So a noop in the lookup would still have the overhead of a 
>function call (unless the compiler was clever enough to optimize it away 
>through the function pointer).  The lookup actually ran slightly slower when 
>optimized, while the switch sped up a little.
>lookup: 9.1u   15.6u
>switch: 7.25u   16.0u
>I then added an array bounds check on the lookup (since I didn't want to be 
>executing arbitrary code with an invalid opcode, such as from a future 
>version of Perl.  Which reminds me, we need a version opcode.)  For the 
>switch, that was handled by the default case.  For both, the result was a 
>noop.  It had no discernable affect on either loop.
>lookup: 9.1u
>switch: 7.25u
>Some additional things to consider:
>I wasn't checking the opcode pointer, assuming I always had a good one.  (I 
>made sure I always had a good one by appending the termination opcode.  I 
>don't know if we can get away with this in real life.)  With real opcodes, 
>you also won't need to do arithmetic on the function table index.  (However, 
>when I padded the function table and removed it, I didn't see any noticeable 
>change, so I left it alone.)
>If we're going to be allowing replaceable opcodes, or allow pre- and post- 
>opcode function functions, that may shift the focus a little.  If you're 
>replacing the functions, having a lookup table would allow you to bubble the 
>new function right to the top, whereas the switch requires the default 
>opcode handler to call the function for you.  Furthermore, even with the 
>switch, you're going to handle pre- and post- functions with a lookup 
>anyway, but then so is the lookup, so I guess they cancel out.  The 
>advantage the switch has, on the other hand, is that not all opcodes should 
>be pluggable.  (Since we're designing this as a software CPU, process 
>control opcodes, such as jumps, program termination, etc. shouldn't be 
>allowed to be overridden.)  Those, if they are simple enough, can then be 
>inlined into the switch, much like the noop, and save some overhead on 
>function calls.
>I haven't gotten events plugged in yet, but the event loop, if we're still 
>planning for a two-loop architecture, should go under similar design 
>considerations.  However, events won't (we hope) occur as often as opcodes 
>do, and will probably be overridden more than opcodes.  In that particular 
>case, we could very well have a mix-and-match situation, where a switched 
>opcode dispatch is faster under normal situations, but a lookup dispatch 
>table is faster for the events normally.  Dunno.  Also, to bury a question 
>way down here where it won't be seen, are we having a fixed number of event 
>priorities, or is it unbounded?
>Yes, I'm well aware of what Knuth says about premature optimizations.  
>Luckily, it's fairly trivial to switch (no pun intended) between the two, if 
>that they are both setup for that.  (I'm not sure I could switch Perl 5 from 
>one to the other, for instance.)  And I'm sure there are some things that 
>I'm not considering.  Nick and Dan may have some additional insight. 
Nick Ing-Simmons

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About