Front page | perl.perl6.internals |
Postings from August 2001
Re: Opcode Dispatch
From: Bryan C . Warnock
August 7, 2001 06:41
Re: Opcode Dispatch
Message ID: 0108070940190F.firstname.lastname@example.org
On Monday 06 August 2001 09:08 am, Bryan C. Warnock wrote:
> It could be that part of the "fixup" is to convert from bytes to wider
> ops, or something similar. If that's the case, I can patch the code and
> rerun it.
Okay. I rewrote the code from scratch. (Rev 2 is always better anyway.)
Same machines as before.
I followed Dan's recipe (for the most part). The opcodes are now 32 bits
wide, and each opcode takes 0, 1, or 2 arguments.
I tested with 512, 1024, 2048, and 8192 opcodes (all contiguous) in a single
table. I did not do any sort of context switching between multiple tables.
The 8192-* tests did not complete, and I've scrapped them. (As you'll see,
some of the tests were insane, and gcc was having fits attempting to
The 2048-* tests did not complete on Solaris. (The tests ran for about
seven hours.) I've reported the partial results, and you should be able to
extrapolate the remainder.
I tested a full table lookup dispatch, a full switch dispatch, and a partial
switch / the rest lookup dispatch.
The full switch had both a normal and an inlined NO-OP opcode variant.
The partial switch would switch on 32, 128, or 256 opcodes (all contiguous),
and had normal, inlined NO-OP, and ully inlined switch variants.
Tests were run with both gcc's debugging '-g' and optimization '-O2' flags.
Infortunately, I didn't time the actual compilation of each test. Some of
them were taking quite a while, and that, of course, should come into play.
Each data set consisted of 40,000 opcodes (randomly distributed between
opcode 2 and opcode[-1]) and their arguments, appended with a single opcode
1 (program termination). The data was interspersed with 7% NO-OP opcodes.
This "program" was looped through 5000 times.
A summary of results:
Full switches are right out, and will not be tested again. They were the
slowest of the constructs, and usually by a lot.
For Linux/x86, lookup consistently faster with no optimizations. With
optimizations, lookup was the fastest with the smallest number of opcodes.
As more and more opcodes were added, some of the inlined partial switches
were just as efficient as a lookup.
For Solaris/Sparc, the inlined-variant partial switches were fastest with
the smaller number of opcodes and case statements. As the number of opcodes
increased, lookup became slightly faster with optimized code, but
consistently slower with the debug code.
The complete results can be found at
Bryan C. Warnock