TL;DR: benchmarks demonstrate no performance gain is possible. On Mon, 17 Jan 2022 09:55:34 -0500 Felipe Gasper <felipe@felipegasper.com> wrote: > > 1) leaving @_ untouched when calling a signatured-sub (i.e. it is > > still the @_ of the caller). > > > > This will have a significant performance boost, especially when > > calling small stub functions like accessors. At the moment perl has > > to do the equivalent of ... > The points heretofore raised in response to this seem to be: > > 1) There is no viable branch currently that implements leaving @_ > untouched. > > 2) The performance gain has yet to be shown. > > I’d love to help in either of these regards, but I lack the knowledge > to assist with #1, and #2 can’t happen without the former. Taking a slightly-edited version of rjbs's example code from elsewhere I get the following benchmarks on my machine. Each test was performed several times and I tried to ignore ones that showed weird timing skew (probably from background noise of my laptop doing other things at the time), and have tried to select a "typical" example. The three test functions are: sub full { die "arity" unless @_ == 1; my ($x) = @_; return $x * $x } sub bare { my ($x) = @_; return $x * $x } sub sigs ($x) { return $x * $x } First, a 5.34.0 release: $ perl5.34.0 benchmark-entersub.pl full: 1.6000s bare: 1.3186s (speedup x1.21) sigs: 1.4231s (speedup x1.12) The signatured version is about 12% faster while performing the same behaviour. The bare version is 21% faster than full, though lacks the arity check. Next up, a perl built from my discourage-defav-in-sigsub branch (this is significantly slower than the release perl above in absolute terms, because it's an unoptimised debug build; but ignore that): $ ./perl -Ilib benchmark-entersub.pl full: 9.0413s bare: 7.0131s (speedup x1.29) sigs: 6.4964s (speedup x1.39) That's more in line with rjbs's original observations - bare is faster than full (by about 29%) but signatures easily win out here, coming in at 39% faster (and also being faster than the bare version). Next up, an edit of a point partway on my "no-snails" branch. At this point, I've edited the various pp_arg* functions to look in the AV found in PAD_SVl(0) instead of GvAV(PL_defav), and I skip the assignment to &GvAV(PL_defav) in this case. The actual code being skipped is tiny[1] - as far as I can tell basically a single pointer assignment; since in order to make pp_arg* work at all we still have to copy the args to the AV found in PAD_SVl(0). As perhaps expected, this change makes no observable difference to timing: $ ./perl -Ilib benchmark-entersub.pl full: 8.7698s bare: 6.9522s (speedup x1.26) sigs: 6.3569s (speedup x1.38) Finally, by noticing that the example code we're benchmarking doesn't really depend on the values it returns, I decided to break perl by doing *even less work* than would actually be required to make the args give the right answers, just to get an upper bound on the highest possible speedup that could be achieved. In this broken version, I don't set up GvAV(PL_defav), nor do I set up the AV in PAD_SVl(0). I don't copy the arguments anywhere at all. OP_ARGELEM now can't find them and will just return undef. I even had to stub out the contents of pp_argcheck so it doesn't even perform an arity check. To be clear: this version of perl is totally useless, but should be even faster than it is possible to achieve for real, because any real perl would have to do more work than this version: $ ./perl -Ilib benchmark-entersub.pl full: 8.7818s bare: 6.9137s (speedup x1.27) sigs: 6.7048s (speedup x1.31) I find this result the most difficult to understand as it is very surprising. I've made pp_entersub slower for everyone (I suspect now because it has to make an extra conditional jump on CvSIGNATURE(cv)) but what's worse is that calling the signatured subs is only 31% faster than the speed of the full ones (it used to be 38% faster; see above). And all this for a broken implementation which doesn't even make the arguments visible or do any arity checking. Adding those things back would necessarily involve adding more code to what I currently have, and thus slow it down further. In conclusion: As they stand in current bleadperl, signatured subs are already faster to call (by a measurable > 30%) than pureperl code that performs the same work by a snail-unpack - either with or without an additional manually-coded arity check. This is true even considering that perl is creating the snail (GvAV(PL_defgv)) and pad-zero (PAD_SVl(0)) AV and copying the argument values into it. (The same AV is shared by both places). An edited version of perl that conditionally does not attempt to set up the snail or pad-zero array for signatured subs does not perform any faster than this (and indeed runs slower), even before one attempts to add in any code that might implement passing the actual argument values into a signatured sub. I do not believe that it is possible to gain any performance benefit by skipping the snail-array setup that is performed by non-signatured subs in "legacy" perl mode. In case folks want to attempt to replicate or extend these tests for themselves, I have attached benchmark-entersub.pl - the script used to print these numbers 0001-No-setup-snail-array-or-PADSVlzero.diff - the full set of changes from current blead, to the (broken) perl that I used for the final benchmark ----- Footnotes: [1]: The code to skip assigning to GvAV(PL_defav): diff --git a/pp_hot.c b/pp_hot.c index 477cdd48b8..e596615743 100644 --- a/pp_hot.c +++ b/pp_hot.c @@ -5246,7 +5246,10 @@ PP(pp_entersub) defavp = &GvAV(PL_defgv); cx->blk_sub.savearray = *defavp; - *defavp = MUTABLE_AV(SvREFCNT_inc_simple_NN(av)); + if(!CvSIGNATURE(cv)) + *defavp = MUTABLE_AV(SvREFCNT_inc_simple_NN(av)); + else + SvREFCNT_inc_simple_void_NN(*defavp); /* it's the responsibility of whoever leaves a sub to ensure * that a clean, empty AV is left in pad[0]. This is normally -- Paul "LeoNerd" Evans leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/Thread Previous | Thread Next