develooper Front page | perl.perl5.porters | Postings from December 2014

OP_MULTIDEREF now in blead

Thread Next
Dave Mitchell
December 7, 2014 09:39
OP_MULTIDEREF now in blead
Message ID:
I've just merged my OP_MULTIDEREF work into blead. Here's the commit
message; I think it's fairly self-explanatory.

commit fedf30e1c349130b23648c022f5f3cb4ad7928f3
Author:     David Mitchell <>
AuthorDate: Fri Oct 24 16:26:38 2014 +0100
Commit:     David Mitchell <>
CommitDate: Sun Dec 7 09:24:55 2014 +0000

    This op is an optimisation for any series of one or more array or hash
    lookups and dereferences, where the key/index is a simple constant or
    package/lexical variable. If the first-level lookup is of a simple
    array/hash variable or scalar ref, then that is included in the op too.
    So all of the following are replaced with a single op:
        local $a[0][$i]
        exists $a[$i]{$k}
        delete $h{foo}
    while these aren't:
        $a[0]       already handled by OP_AELEMFAST
        $a[$x+1]    not a simple index
    and these are partially replaced:
        (expr)->[0]{$k}   the bit following (expr) is replaced
        $h{foo}[$x+1][0]  the first and third lookups are each done with
                          a multideref op, while the $x+1 expression and
                          middle lookup are done by existing add, aelem etc
    Up until now, aggregate dereferencing has been very heavyweight in ops; for
    example, $r->[0]{$x} is compiled as:
        gv[*r] s
        rv2sv sKM/DREFAV,1
        rv2av[t2] sKR/1
        const[IV 0] s
        aelem sKM/DREFHV,2
        rv2hv sKR/1
        gvsv[*x] s
        helem vK/2
    When executing this, in addition to the actual calls to av_fetch() and
    hv_fetch(), there is a lot of overhead of pushing SVs on and off the
    stack, and calling lots of little pp() functions from the runops loop
    (each with its potential indirect branch miss).
    The multideref op avoids that by running all the code in a loop in a
    switch statement. It makes use of the new UNOP_AUX type to hold an array
        typedef union  {
            PADOFFSET pad_offset;
            SV        *sv;
            IV        iv;
            UV        uv;
        } UNOP_AUX_item;
    In something like $a[7][$i]{foo}, the GVs or pad offsets for @a and $i are
    stored as items in the array, along with a pointer to a const SV holding
    'foo', and the UV 7 is stored directly. Along with this, some UVs are used
    to store a sequence of actions (several actions are squeezed into a single
    Then the main body of pp_multideref is a big while loop round a switch,
    which reads actions and values from the AUX array. The two big branches in
    the switch are ones that are affectively unrolled (/DREFAV, rv2av, aelem)
    and (/DREFHV, rv2hv, helem) triplets. The other branches are various entry
    points that handle retrieving the different types of initial value; for
    example 'my %h; $h{foo}' needs to get %h from the pad, while '(expr)->{foo}'
    needs to pop expr off the stack.
    Note that there is a slight complication with /DEREF; in the example above
    of $r->[0]{$x}, the aelem op is actually
        aelem sKM/DREFHV,2
    which means that the aelem, after having retrieved a (possibly undef)
    value from the array, is responsible for autovivifying it into a hash,
    ready for the next op. Similarly, the rv2sv that retrieves $r from the
    typeglob is responsible for autovivifying it into an AV. This action
    of doing the next op's work for it complicates matters somewhat. Within
    pp_multideref, the autovivification action is instead included as the
    first step of the current action.
    In terms of benchmarking with Porting/, a simple lexical
    $a[$i][$j] shows a reduction of approx 40% in numbers of instructions
    executed, while $r->[0][0][0] uses 54% fewer. The speed-up for hash
    accesses is relatively more modest, since the actual hash lookup (i.e.
    hv_fetch()) is more expensive than an array lookup. A lexical $h{foo}
    uses 10% fewer, while $r->{foo}{bar}{baz} uses 34% fewer instructions.
   --tests='/expr::(array|hash)/' ...
                  PRE   POST
               ------ ------
            Ir 100.00 145.00
            Dr 100.00 165.30
            Dw 100.00 175.74
          COND 100.00 132.02
           IND 100.00 171.11
        COND_m 100.00 127.65
         IND_m 100.00 203.90
    with cache misses unchanged at 100%.
    In general, the more lookups done, the bigger the proportionate saving.

Fire extinguisher (n) a device for holding open fire doors.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About