develooper Front page | perl.perl6.internals | Postings from July 2001

Opcode Dispatch

Thread Previous | Thread Next
Bryan C . Warnock
July 29, 2001 21:54
Opcode Dispatch
Message ID:
Not that this is the most scientific testing in the world, but I did write a 
couple variations of an opcode dispatch loop just to see how they compare.

Of course, I violated rule number one of writing fast code - I didn't run it 
on a slow machine.  Shame on me.

Given an 80K opcode stream consisting of 25 opcodes run through 4,000 times, 
with a terminating 26th opcode, and with 26 individual opcode dispatch 
functions.  Too small a dataset, but I wanted to get a feel for it.

Each of the functions took the opcode pointer and a data pointer.  One 
function handled either the buffer reset, or the program exit.  The 
remainder did some simple arithmetic on the addresses of both local and 
argument variables, and added (or subtracted) that to some global variable, 
which was used at exit.  (In an effort to keep the busy work from being 
optimized out.)

The two major variations were the Big Switch, and a function pointer table 

int main () {
   char *buf = "....";
   char *opcode = buf;
   char *(*fn[])(char *,int *) = {
      opcodefA, ... , opcodefZ };  /* See why we love Perl? */
   int x;

   do {

Here's the switch, basically.

      switch (*opcode) {
         case 'A': opcode = opcodefA(opcode, &x); break;
         case 'Z': opcode = opcodefZ(opcode, &x); break;
         default: break;

and here's the lookup:

      opcode = fn[(int)(*opcode - 'A')](opcode,&x);

I then ran each as compiled with -g, and then with -O2.  I ran each program 
six times, throwing away the first results, and averaging the remaining 
five.  For both cases, the optimized version ran about twice as fast as the 
debuggable version.  The lookup was slightly faster for unoptimized code, 
while the switch was quicker after optimizations.

lookup: 8.9u   15.7u
switch: 7.65u   16.3u

I then chose an opcode to make a noop.  For the switch, that meant changing 
a case to be a simple opcode increment, but for the lookup, that involved 
changing the opcode function to do nothing except return the incremented 
pointer.  So a noop in the lookup would still have the overhead of a 
function call (unless the compiler was clever enough to optimize it away 
through the function pointer).  The lookup actually ran slightly slower when 
optimized, while the switch sped up a little.

lookup: 9.1u   15.6u
switch: 7.25u   16.0u

I then added an array bounds check on the lookup (since I didn't want to be 
executing arbitrary code with an invalid opcode, such as from a future 
version of Perl.  Which reminds me, we need a version opcode.)  For the 
switch, that was handled by the default case.  For both, the result was a 
noop.  It had no discernable affect on either loop.

lookup: 9.1u
switch: 7.25u

Some additional things to consider:

I wasn't checking the opcode pointer, assuming I always had a good one.  (I 
made sure I always had a good one by appending the termination opcode.  I 
don't know if we can get away with this in real life.)  With real opcodes, 
you also won't need to do arithmetic on the function table index.  (However, 
when I padded the function table and removed it, I didn't see any noticeable 
change, so I left it alone.)

If we're going to be allowing replaceable opcodes, or allow pre- and post- 
opcode function functions, that may shift the focus a little.  If you're 
replacing the functions, having a lookup table would allow you to bubble the 
new function right to the top, whereas the switch requires the default 
opcode handler to call the function for you.  Furthermore, even with the 
switch, you're going to handle pre- and post- functions with a lookup 
anyway, but then so is the lookup, so I guess they cancel out.  The 
advantage the switch has, on the other hand, is that not all opcodes should 
be pluggable.  (Since we're designing this as a software CPU, process 
control opcodes, such as jumps, program termination, etc. shouldn't be 
allowed to be overridden.)  Those, if they are simple enough, can then be 
inlined into the switch, much like the noop, and save some overhead on 
function calls.

I haven't gotten events plugged in yet, but the event loop, if we're still 
planning for a two-loop architecture, should go under similar design 
considerations.  However, events won't (we hope) occur as often as opcodes 
do, and will probably be overridden more than opcodes.  In that particular 
case, we could very well have a mix-and-match situation, where a switched 
opcode dispatch is faster under normal situations, but a lookup dispatch 
table is faster for the events normally.  Dunno.  Also, to bury a question 
way down here where it won't be seen, are we having a fixed number of event 
priorities, or is it unbounded?

Yes, I'm well aware of what Knuth says about premature optimizations.  
Luckily, it's fairly trivial to switch (no pun intended) between the two, if 
that they are both setup for that.  (I'm not sure I could switch Perl 5 from 
one to the other, for instance.)  And I'm sure there are some things that 
I'm not considering.  Nick and Dan may have some additional insight. 

Bryan C. Warnock

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About