develooper Front page | perl.perl5.porters | Postings from February 2015


Thread Previous | Thread Next
Dave Mitchell
February 25, 2015 18:24
Message ID:
On Tue, Feb 24, 2015 at 04:05:34PM +0000, Zefram wrote:
> Dave Mitchell wrote:
> >The branch smoke-me/davem/op_signature, currently being smoked, adds
> >a new op, OP_SIGNATURE to the core. The principal purpose of this
> >op is to handle (most of) the work of assigning args to params on
> >signature subs (e.g. sub foo ($a, $b, $c = 1) {...}).
> While I like the performance win, I dislike this op type being so closely
> tied to signatures.  Specifically, I dislike (a) that its behaviour
> is very specific and a priori arbitrary, being precisely tailored to
> what signatures do;

What's the matter with that? Assigning args to params is a fundamental
operation in many programming languages, and it seems entirely logical
to have an op devoted to this. Just like we have a special op,
OP_ENTERSUB, that is precisely tailored to performing a perl-specific
subroutine call.

> (b) that it relies on only being called at the very beginning of a sub;

Again, what's the matter with that?

> (c) that it is only generated by syntactic signature parsing (and the
> non-default "my(...)=@_" special case);

So it's an op that's only generated when args are unambiguously being
assigned to params. What's the matter with that?

> (d) that the patch makes syntactic signature parsing reliant on this
> special op;

Again, what's the matter with that?

> and (e) that deparsing signature syntax is tied to the special op.

Again, what's the matter with that?

> On the whole it's violating the principle that syntactic signatures are
> only *syntax*, and should not be semantically special.

OP_SIGNATURE is just an internal implementation detail and imparts no
constraints on the syntax or semantics of signatures. It just happens to
be the thing that will need modifying if someone wants to alter the syntax
or semantics of signatures. In the same way that changes to the syntax
and semantics of open() (and there have been many over the years)
require changes in pp_open() etc.

> Having signature parsing rely on generating only an OP_SIGNATURE causes
> two obvious problems.  Firstly, it is likely to complicate whatever
> plugin interface eventually gets added to signatures (to allow modules
> to supply new kinds of item that can appear in a signature).

While I'm not always opposed in principle to things being pluggable, I
would attach a much higher priority to reducing the overhead of function
calls in perl as much as possible. This is an area where perl is
notoriously slow. If I had to choose just one or the other, I would
choose speed without a flicker of hesitation. After all, if people want a
infinitely pluggable parser, they know where to find perl6.

But as far as I can see, there's nothing in my implementation that stops
making signatures pluggable. Just add at the start of
Perl_parse_subsignature() something like:

    if (PL_signature_hook) {
        return (the op subtree created above);

and then something at the perl or XS level can parse the signature itself
and return whatever ops it wants instead of OP_SIGNATURE.

> Secondly,
> a limitation of the op implementation, on the number of parameters that
> it can handle, translates into an inability of the signature parser to
> handle more parameters, rather than merely limiting which signatures
> get the full speed boost.

That's an implementation detail. It could be easily changed to support
2^31 or even 2^64 params if people felt it necessary, at the expense of
making the op_aux stricture bigger for each sub and the sub call overhead
slightly more expensive (but it would be just a 10 line patch). I went
with the more efficient restriction because I couldn't conceive that
anyone would ever require more than 32767 named params. Note that we
already have much more onerous limitations in the regex engine, with
/FOO{1,N}/ not working correctly for various classes of FOO when N>32757.
I'm actually much more bothered by the regex engine limitations than the
signatures one.

> What I'd like to see is op types that have much wider applicability,
> can be used in many circumstances, and are generated automatically (and
> unconditionally) by the checkers of existing ordinary op types and by
> the peephole optimiser.  The signature parser should continue to generate
> the many ops that it currently does, blissfully unaware that the checkers
> are collapsing some of the structures to optimised single ops as it goes.
> For example, the op sequence corresponding to "@_ >= N+1 ? $_[N] :" is
> frequently generated by signature code, and could be collapsed into a
> single op (with the default-value op tree as a child) by newCONDOP().
> This would benefit not just signatures but also anything else that
> processes a subroutine argument in the same manner.  Collapsing a sequence
> of argument assignments into one padrange-like op could be done by the
> peephole optimiser.

Such an approach would leave signatures always significantly slower
than a comparable my(...)=@_. You would end up executing one or more ops
for each param, and (for example) each op execution would have to
individually retrieve @_ from the *_ typeglob, then extract out the length
and AvARRAY pointer and do stuff with them. The next op would do this all
over again.

Rolling up a list of similar operations into a single op is usually the
best way of getting performance gains, hence things like OP_AASSIGN, or
more recently my OP_PADRANGE and OP_MULTIDEREF optimisations.

You also have things like (using Deparse on the old implementation for

    die sprintf("Too many arguments for subroutine at %s line %d.\n", (caller)[1
, 2]) unless @_ <= 3;
    die sprintf("Too few arguments for subroutine at %s line %d.\n", (caller)[1,

If you only do generic optimisations, then *every* sub has to carry around
with it a set of of ops for dying in the caller's namespace with a "too
few/many args" string. If instead you add a specific optimisation that has
those checks and messages built into it, then you're halfway to

As it stands, the existing implementation is woefully not ready for
production use. For people to start using signatures it needs to have a
performance at least comparable with my(...)=@_. But (for those reading
along who aren't aware), sub f { my ($a,$b,$c) = @_ } compiles to:

    1  <;> nextstate(main 2 -e:1) v
    2  <0> padrange[$a:2,3; $b:2,3; $c:2,3] */LVINTRO,3
    3  <2> aassign[t6] KS
    4  <1> leavesub[1 ref] K/REFC,1

while sub f ($a,$b,$c) {} compiles to:

    1  <;> nextstate(main 80 -e:1) v:%,469762048
    2  <#> gv[*_] s
    3  <1> rv2av[t19] sK/1
    4  <$> const[IV 3] s
    5  <2> le sK/2
    6  <|> or(other->7) vK/1
    7      <0> pushmark s
    8      <0> pushmark sM
    9      <$> const[PV "Too many arguments for subroutine at %s line %d.\n"] sM
    a      <0> pushmark s
    b      <$> const[IV 1] s
    c      <$> const[IV 2] s
    d      <0> pushmark s
    e      <0> caller[t15] l
    f      <2> lslice lKM/2
    g      <@> sprintf[t16] sK/2
    h      <@> die[t17] vK/1
    i  <;> nextstate(main 80 -e:1) v:%,469762048
    j  <#> gv[*_] s
    k  <1> rv2av[t14] sK/1
    l  <$> const[IV 3] s
    m  <2> ge sK/2
    n  <|> or(other->o) vK/1
    o      <0> pushmark s
    p      <0> pushmark sM
    q      <$> const[PV "Too few arguments for subroutine at %s line %d.\n"] sM
    r      <0> pushmark s
    s      <$> const[IV 1] s
    t      <$> const[IV 2] s
    u      <0> pushmark s
    v      <0> caller[t10] l
    w      <2> lslice lKM/2
    x      <@> sprintf[t11] sK/2
    y      <@> die[t12] vK/1
    z  <;> nextstate(main 77 -e:1) v:%,469762048
    10 <#> aelemfast[*_] s
    11 <0> padsv[$a:77,80] sRM*/LVINTRO
    12 <2> sassign vKS/2
    13 <;> nextstate(main 78 -e:1) v:%,469762048
    14 <#> aelemfast[*_] s/1
    15 <0> padsv[$b:78,80] sRM*/LVINTRO
    16 <2> sassign vKS/2
    17 <;> nextstate(main 79 -e:1) v:%,469762048
    18 <#> aelemfast[*_] s/2
    19 <0> padsv[$c:79,80] sRM*/LVINTRO
    1a <2> sassign vKS/2
    1b <;> nextstate(main 80 -e:1) v:%,469762048
    1c <0> stub P
    1d <1> leavesub[1 ref] K/REFC,1

that just isn't production-ready. Which isn't a criticism as such; there's
nothing wrong with the initial proof-of-concept implementation being
slow, as long is the intention is to not leave it there.

Replacing all that above with

    1  <;> nextstate(main 77 -e:1) v:%,469762048
    2  <+> signature($a, $b, $c) v
    3  <;> nextstate(main 80 -e:1) :%,469762048
    4  <1> leavesub[1 ref] K/REFC,1

seems sensible to me.

Furthermore at its heart, the op_aux array of the OP_SIGNATURE is just a
compact representation of the sub's signature, making it relatively easy
for other code at the C or Perl level to extract out that information and
do whatever it likes with it. I would expect that to be a lot easier than
for code to try to deduce or manipulate signatures when just presented
with a (possibly partially optimised) optree like the one above. My C
function Perl_signature_stringify() is 120 lines and my
deparse_signature() Perl function in is just 100 lines.
Conversely, the code in op.c to recognise the relatively trivial op-tree
pattern corresponding to my(...)=@_ and convert it into an OP_SIGNATURE,
is about 200 lines.

> The decision to deparse using signature syntax should be based on whether
> the signature feature is lexically enabled at that point in the output
> and the -x level, rather than purely on the op type.  The deparser on
> your branch relies, for correct output, on the presence of a signature
> op always matching up with whether the deparser has lexically enabled
> signatures in its output, but I believe it will still actually be too late
> in emitting the "use feature 'signatures';" line.

In my implementation, the way a sub is deparsed matches how it was
written. If it was compiled as 'sub f($a,$b)' then it will be output
as 'sub f($a,$b)'. If a whole file is being deparsed, then Deparse will
(modulo bugs) have already emitted the correct 'use feature "signatures".
Note that with my changes, "TEST -deparse op/signatures.t" now passes.

On the other hand, if a sub is being deparsed standalone (e.g. by a
serialiser), then cases could be made that either
a) the serialiser should be allowed to introspect the CV to see whether
it has a signature, and if so wrap suitable 'feature' directives around
the sub when deserialising, or
b) the serialiser should be allowed to request what format the sub is
deparsed as, either as a signature or as a series of arg-processing

I personally never use Deparse apart from while fixing deparse issues in
blead, so I don't have a strong feeling as to what approach(es) are best.
But the existing implementation, where sub f ($a,$b,$c) {} deparses as:

    sub f {
        use feature 'signatures';
        die sprintf("Too many arguments for subroutine at %s line %d.\n", (caller)[1, 2]) unless @_ <= 3;
        die sprintf("Too few arguments for subroutine at %s line %d.\n", (caller)[1, 2]) unless @_ >= 3;
        my $a = $_[0];
        my $b = $_[1];
        my $c = $_[2];

would, I think, surprise many people.

> It's also incorrect
> to tie the change in prototype syntax (short syntax vs attribute) to the
> use of signature syntax: that too should be tied to whether the signature
> feature has been enabled, because it is that feature flag that disables
> the short prototype syntax.  (Mind you, the existing deparser *always*
> uses short prototype syntax, so is also faulty.)

This is subordinate to the previous issue: if for whatever reason
Deparse  (or Deparse's caller) has chosen to display the sub using the
signature syntax, then it *has* to change the prototype output from (...)
to :proto(...).

> The CvHASSIG flag is a bad idea: it's exposing an internal implementation
> detail, which has no legitimate uses as sub metadata, and when it's made
> this easy there are going to be people who will misuse it.  The sub's
> behaviour, including that derived from the signature, is already
> adequately represented as the op tree.

I think you'll find that virtually *every* Cvf_* flag is exposing an
internal implementation detail. While I'm not particularly bothered either
way, I'm surprised at your vehemence against it. I can see a legitimate
use for it in, for example a serialiser, which can decide to what extent
it wants to deparse a sub as closely as possible to the original form while
taking into account considerations such as the target perl version(s)
and the environment in which the deparsed sub will be either human-viewed
and/or recompiled.

> The signature op is also not sufficiently internal for compatibility
> concerns to be ignored.  Its addition means that code walkers need to
> understand it, and non-core modules will have the option to generate it.
> We shouldn't impose the former requirement without being satisfied that
> we're going to stick with the signature op in pretty much this form.
> We shouldn't offer the latter feature without either such satisfaction
> or an "experimental" notice.

This is something that annoys me. Taken to its logical conclusion, the
entirety of the OP_FOO and pp_foo() system could be regarded as public
API, and we should never add a new OP, or a new private flag to an
existing op, or change the compile-time or run-time behaviour of any op
in any way. Special biological word then springs to mind.

At the moment, OP_SIGNATURE *is* experimental; it only as appears as the
implementation of the experimental signatures feature. At some point it
will become non-experimental; after that point I would still feel free to
change it - for example to add support for 'is rw' style params; just as I
would still feel free to modify OP_PADRANGE or OP_MULTIDEREF (or OP_OPEN)
if some useful facility or optimisation occurred to me.

Look, I think there's a real mental impedance mismatch here. My main
interest in perl at the moment is to keep it in roughly its current form
while fixing bugs and making it go faster. I have little interest in, and
little knowledge of, high-falutin' stuff like pluggable syntax parsers or
whatever, so I can't really "tune in" to where you're coming from. Thus
while I respect that your detailed comments are sincere and well thought
out, I don't really "get" them. To me, OP_SIGNATURE still seems like the
obvious and correct approach.

Wesley Crusher gets beaten up by his classmates for being a smarmy git,
and consequently has a go at making some friends of his own age for a
    -- Things That Never Happen in "Star Trek" #18

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About