Front page | perl.perl6.internals |
Postings from December 2001
Re: JIT me some speed!
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
December 24, 2001 10:00
Subject:
Re: JIT me some speed!
Message ID:
20011224171429.A2650@Bagpuss.unfortu.net
On Fri, Dec 21, 2001 at 12:03:51AM +0000, Tom Hughes wrote:
> It looks like it is going to need some work before it can work for
> other instruction sets though, at least for RISC systems where the
> operands are typically encoded with the opcode as part of a single
> word and the range of immediate constants is often restricted.
>
> I'm thinking it will need some way of indicating field widths and
> shifts for the operands and opcode so they can be merged into an
> instruction word and also some way of handling a constant pool so
> that arbitrary addresses can be loaded using PC relative loads.
Another thing that struck me on reading it was:
=item C<B<&IR>>I<n>
Place the address of the C<INTVAL> register specified in the I<n>th argument.
RISC chips have lots of general purpose registers. It's likely that there
will be enough spare that several can be used to map to parrot registers.
Say 4 are available, it would be useful to be able to say that an op
requires the value of rN and rM, and modifies rD. The JIT compiler would make
a sandwich with the code to read in N and M into two of the real CPU registers,
the op filling, and then some more code to write D back to memory.
However, if the JIT can see that N is already in memory from the previous
OP, or D is going to be used and modified by the next op, it can skip, defer
or whatever some of the memory reads and writes.
[And provided the descriptions are this helpful it doesn't have to do it
immediately. It becomes possible to write a better optimising JIT that makes
sandwiches with multiple fillings or even Scooby Snacks, while the initial
JIT insists that the only recipe available is bread, 1 filling, bread]
mops will be fast if
REDO: sub I4, I4, I3
if I4, REDO
maps to
REDO:
load I4 from memory (which will be in the L1 cache)
load I3 from memory
I4 = I4 - I3
store I4 to memory
load I4 from memory
is it 0?
goto REDO if true
it will be slightly faster if it maps to
REDO:
load I4 from memory (which will be in the L1 cache)
load I3 from memory
I4 = I4 - I3
store I4 to memory
# I4 still in a CPU register
is it 0?
goto REDO if so
and faster still if the JIT can see how to push things out of the loop:
load I4 from memory
load I3 from memory
REDO:
I4 = I4 - I3
is it 0?
goto REDO if so
store I4 to memory
(does threading mess this idea up?)
Nicholas Clark
Thread Previous
|
Thread Next