develooper Front page | perl.perl5.porters | Postings from January 2022

Re: Benchmarking a 'no-snails' world (was: Re: PSC #049 2022-01-07)

Thread Previous | Thread Next
Dave Mitchell
January 19, 2022 12:10
Re: Benchmarking a 'no-snails' world (was: Re: PSC #049 2022-01-07)
Message ID:
On Mon, Jan 17, 2022 at 10:15:33PM +0000, Paul "LeoNerd" Evans wrote:
[lots of benchmarking stuff]

I'm going to have to respectfully disagree with your benchmarking results
and conclusions for populating @_ :-).

First off, I think (if I am reading your diffs correctly), you have
missed cutting out all the @_ tearing down at sub exit that appears
in Perl_cx_popsub_args().

But more generally, note that perl already has a mechanism for calling a
sub without populating @_: the &foo; calling convention. This is already
special-cased in pp_entersub and elsewhere with the hasargs and CxHASARGS
flags. So it should be possible (in theory) to exploit the existing non-@_
entry and exit code paths without adding extra overhead.

Since this pathway already exists, it's possible to benchmark it without
hacking the perl interpreter itself:

    use Benchmark ':all';

    use feature 'signatures';
    no  warnings 'experimental';

    sub foo0 { }
    sub foo2 ($x,$y) { }

    cmpthese(-3, {
        ampersand => sub { &foo0;     },
        args0     => sub { foo0();    },
        args2     => sub { foo2(1,2); },

with that I get:

                    Rate     args2     args0 ampersand
    args2     15642116/s        --      -64%      -69%
    args0     43208035/s      176%        --      -14%
    ampersand 50372768/s      222%       17%        --

converting that into microseconds per call, I get

    0.1985199622145  ampersand
    0.2314384350040  args0
    0.6392996957700  args2

that seems to me to show that about 0.033 us is spent per call just setting
up and tearing down  @_ itself, even in the absence of any arguments. That
represents about 5% of the total overhead of calling a 2-arg signature

But it is important to note that signature sub arg processing is NOT yet
optimised. It's always been my long-term plan that the current arrangement
of an OP_ARGELEM (plus nextstate) per arg will be replaced (by the
peephole optimiser) by a single OP_SIGNATURE op which implements a simple
FSM to populate all args. This will be much faster than the current
arrangement. When that comes to pass, the overhead of populating @_ will
then represent considerably more than 5% of the total sub calling overhead.

Music lesson: a symbiotic relationship whereby a pupil's embellishments
concerning the amount of practice performed since the last lesson are
rewarded with embellishments from the teacher concerning the pupil's
progress over the corresponding period.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About