develooper Front page | perl.perl5.porters | Postings from December 2007

Re: optimising opcodes

Thread Previous | Thread Next
Jim Cromie
December 29, 2007 10:41
Re: optimising opcodes
Message ID:
Nicholas Clark wrote:
> I was thinking about something Jim Cromie mailed me privately, and wondered:
> "How many null ops do we have, and how many cache misses do they cause?"
> For example:
> $ perl -MO=Concise -e 'print foreach @ARGV'
> lots of ex-ops. Pining for the fjords. Taking up space.
4/21 ops are (in principle) removable.

> So I wondered:
> 0: How hard would it be to write some code to take an optree, and copy out all
>    the live ops into a new optree, skipping any nulled out ops?

I once had a patch to peep that did this for 1 case (ex-rv2sv).  Aug 16 
2004.  No patch attached :-(
At any rate, DaveM thought I got lucky avoiding dragons.

> 1: Would that break the runtime?
> 2: How much would it break B::Deparse?
> 3: If you configured perl to use a slab allocator, and happened to do two
>    parses (first to determine the total size of ops, second to copy them in)
>    and thereby allocated a slab of the exact size, would that help more?
> 4: If you copied them in execution order, would that help more?
perf ( exec-order ) >= perf (random-order-in-a-slab)

assuming data-prefetch is operating ( instruction prefetch is better, 
but our ops are data)
I dont think it would be worse.  Whether its measurable is something else.
cachegrind doesnt model the microarchitectural things (ref 
iiwsc2006.ppt, see

If one were to write a repack_ops routine, where would one call it from ?
It has to be *after* Perl_peep in order to get the exec-order optimizations.

> $ perl -MO=Concise,-exec -e 'print foreach @ARGV'
> 1  <0> enter 
> 2  <;> nextstate(main 1 -e:1) v
> 3  <;> nextstate(main 1 -e:1) v
> 4  <0> pushmark sM
> 5  <$> gv(*ARGV) s
> 6  <1> rv2av[t1] sKRM/1
> 7  <$> gv(*_) s
> 8  <{> enteriter(next->c last->f redo->9) lKS
> d  <0> iter s
> e  <|> and(other->9) vK/1
> 9      <0> pushmark s
> a      <$> gvsv(*_) s
> b      <@> print vK
> c      <0> unstack v
>            goto d
> f  <2> leaveloop vK/2
> g  <@> leave[1 ref] vKP/REFC
> [note, by doing things with a slab allocator, I believe that even though the
> transformed ops are contiguous in memory, they are also indistinguishable from
> regular ops created by the perl compiler]
> 5: How do you spot code executed only once, and avoid doing this?
>    (main program, eval blocks, top level code from do/require/use,
>     BEGIN/UNITCHECK/CHECK/INIT/END - ie basically anything not a "real"
>     subroutine definition)
Why do you think it matters ? 
Is it just avoiding cpu cycles doing stuff that has low/no payoff ?
or is is caution/paranoia, or a specific reason ?

The repack_ops()  should be little more than 1 more malloc,
and 1 more tree traversal, with memcopies, filling some ptr-table hashes
to map old-to-new pointers, then iterating over them to make ptr updates.

Knowing the op-size for the memcopies is more difficult than it should be.
I sucessfully reordered the ops (ie tests didnt break) here.
My goal was to reduce the size determination down to a binary search on 

aside: it is unfortunate that op_type 1= type of op (forex BINOP)
I guess op_kind is a usable pseudo-synonym

> 6: Is the op_opt flag bit free at this point? Does having a bit help?
I know I always like to get a bit more ;-)

> and for bonus points - why are there two nextstate ops in that example, and
> can we optimise one out? How often does it happen?
> Nicholas Clark

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About