develooper Front page | perl.perl5.porters | Postings from March 2015


Thread Previous | Thread Next
Dave Mitchell
March 3, 2015 21:30
Message ID:
On Tue, Mar 03, 2015 at 10:16:19AM -0300, Vincent Pit (VPIT) wrote:
> By the way, do you have an idea about why do you get a performance gain with
> OP_SIGNATURE as compared to the previous padrange optimization which, if I
> remember correctly, can already handle "my (...) = @_" in one op?
> (disclaimer: I'm neither for or against the proposal. I'm just curious about
> what could cause such a dramatic improvement when padrange seems at first
> glance conceptually much simpler).

pp_padrange() just pushes a mark, a consecutive range of pad SVs, and
optionally another mark and the elements of @_ onto the stack. The
actual assignment is done by pp_aassign() as normal.

pp_signature() on the other hand:

* doesn't push the param vars and @_ contents onto the stack, nor does
it push marks;

* where appropriate, it checks that the caller provided the correct range
of arg numbers and croaks in the caller if not;

* it handles the processing of default expressions, e.g.
     sub f ($a, $b=$a, $c=$a+1)
It handles the simple expressions (consts and vars) directly, and
processes the complex ones by returning a 'next' pointer corresponding to
jumping over the correct number of default assign expressions that follow
the op (so in particular, it doesn't execute  a bunch of false conditional
assign expressions that each say 'do this assign if @_ > N for some N');

* handles placeholder params, e.g. sub f($a, $, $b) and equivalently
my ($a, undef, $b) = @_;

* handles non-contiguous pad ranges; for example in
    sub f($a, $b = $x++, $c=1),
$b and $c will have non-contiguous targets, since a padtmp will have been
allocated for postinc.

* The main assignment loop is more efficient than pp_aassign() because
it can take advantage of the specific circumstances pertaining: for
example in the branch where there are no args left and the default value is
an int, it calls sv_setiv() rather than calling sv_setsv(), and sometimes
it can even avoid calling sv_setiv(). Here's a chunk of the code in

            case SIGNATURE_arg_default_0:
                i = 0;
                goto setiv;

            case SIGNATURE_arg_default_1:
                i = 1;
                goto setiv;

            case SIGNATURE_arg_default_iv:
                i = items->iv;
                /* do $varsv = i.
                 * NB it's likely that on subsequent calls the cleared
                 * lexical will have formerly been SVt_IV; if this
                 * is the case, we can do a short-cut */
                if (LIKELY(SvTYPE(varsv) == SVt_IV)) {
                    SvIV_set(varsv, i);
                sv_setiv(varsv, i);

That is also an example of a case where we take advantage of the fact that
the new lexical will be empty; if that constraint were removed, then
those asserts would instead have to become extra conditions on the if().

This is why for this code  (as I mentioned in my original announcement):

    sub f { my ($a, $b, $c) = @_;1 }
    my $self = {};
    f($self,1,2) for 1..N;

comparing vanilla perl with a perl compiled with -DPERL_FAKE_SIGNATURE,
for 1 complete call of that sub, including:
    pushing a mark, $self, 1, 2, *f onto the stack,
    calling pp_enterub,
    calling pp_nextstate,
    calling pp_padrange+pp_aassign or pp_signature
    another pp_nextstate;

the number of x86_64 CPU instructions executed with pp_padrange+pp_aassign
is 1492, while with pp_signature, it drops to 1119, a saving of 25%.
And that's for the whole function call overhead, not just executing
pp_signature verses pp_padrange+pp_aassign.

On Tue, Mar 03, 2015 at 04:31:52PM -0000, Father Chrysostomos wrote:
> I think Zefram has a point.  If you were to create signature ops in
> the same way you create multideref ops, it would be a more general-
> purpose optimisation that would automatically speed up any similar
> code occurring in existing subs.  I would be all for implementing
> it that way.

Well, if someone can suggest one of more ops that are (a) generic,
(b) can give performance anywhere approaching the above (bearing in mind
the description above of all the things that pp_signature() does, and some
of the optimisations it does), then I'm prepared to consider it. I will be
very surprised if someone *can* come up with such a suggestion, though.

Please note that ash-trays are provided for the use of smokers,
whereas the floor is provided for the use of all patrons.
    -- Bill Royston

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About