develooper Front page | perl.perl5.porters | Postings from March 2015

Re: OP_SIGNATURE

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
March 3, 2015 11:57
Subject:
Re: OP_SIGNATURE
Message ID:
20150303115737.GR28599@iabyn.com
On Wed, Feb 25, 2015 at 09:00:08PM +0000, Zefram wrote:
> Dave Mitchell wrote:
> >On Tue, Feb 24, 2015 at 04:05:34PM +0000, Zefram wrote:
> >>                      Specifically, I dislike (a) that its behaviour
> >> is very specific and a priori arbitrary, being precisely tailored to
> >> what signatures do;
> >
> >What's the matter with that?
> 
> It's a messy design.  The op types are, for the most part, reasonable
> programming primitives, with simple specifications.  The signature op
> type is entirely contrary to that.  Sure, we make some exceptions for
> performance, such as padrange, but we have to strike a balance between
> performance and API cleanliness.  padrange is a decent tradeoff: a good
> bit of performance win for a small bit of API mess.  The signature op
> is way too far at the vomit-over-the-API end of the spectrum.

This is a subjective matter on which we'll have to disagree.
NB - do you also object to my recent introduction of OP_MULTIDEREF?

> >                               Just like we have a special op,
> >OP_ENTERSUB, that is precisely tailored to performing a perl-specific
> >subroutine call.
> 
> entersub has a lot of Perl-specific internals, but its specification
> (little more than "call referenced subroutine with arguments from the
> stack") is straightforward and mostly an inevitable programming primitive.
> It's the interface that matters, not the implementation.

The specification of OP_SIGNATURE is 'assign the passed stack items to
the signature variables with default handling'.

> >> (b) that it relies on only being called at the very beginning of a sub;
> >
> >Again, what's the matter with that?
> 
> It would limit the manipulability of the op.  The op tree structure
> generally permits us to shift ops around quite freely, because ops work
> largely independent of context.  (They're specific to the particular sub
> being compiled, but can be moved freely within the sub.)  Your signature
> op would be an ugly exception.  It wouldn't be quite so bad if op mungers
> could readily turn the signature op into a context-independent form,
> but your design provides no such escape hatch short of the op munger
> fully understanding the guts of the signature op and expanding it into
> the large number of ops that signatures currently generate.

Currently OP_SIGNATURE just assumes that the lexical variables it
is initialising are properly "fresh" (undef, no magic, not tied etc).
As long as that constraint holds, it doesn't have to be the first op in
the sub. If a consensus were reached that that constraint is unreasonable,
then OP_SIGNATURE could be easy tweaked to handle that, at the cost of
reduced performance.

> >> (c) that it is only generated by syntactic signature parsing (and the
> >> non-default "my(...)=@_" special case);
> >
> >So it's an op that's only generated when args are unambiguously being
> >assigned to params. What's the matter with that?
> 
> Arguments are unambiguously assigned to lexical variables by plenty of
> statements other than signatures.  (Your argument here accords syntactic
> signatures an unwarranted semantic significance.)  I'm all for having
> ops optimised to help with argument handling, but they should apply to
> argument handling expressed either way.  They should be aimed at the
> semantic activity of extracting arguments, not at one particular syntax
> that can be used to express that activity.

OP_SIGNATURE *is* aimed at the "semantic activity of extracting
arguments", and is not tied to one particular syntax. Part of the proof of
that is that it can be used to implement my(...) = @_ syntax in addition
to handling the current signature syntax. It can be used anywhere
where the contents of @_ need to be assigned to a list of my() variables
(modulo the optimisation which requires fresh lexicals) with optional arg
count error checking and default value assignment.

> Having the new op only generated in such specific cases is also another
> blow against its general usability, and making it such a special case
> invites bugs based on assuming too much about its specificity.  To avoid
> bugs, new op types should be as unspecial as possible.

This is an op that is likely to be invoked at the start of most function
calls within perl (at least ones that start with my(...) = @_). So it
will get very widely exercised very quickly. In fact one of my
motivations of adding the optional "my(...) = @_" optimisation was
to greatly increase the ways in which OP_SIGNATURE was called, and 
thus to get tested by the whole perl test suite rather than just
t/op/signature.t.

> >OP_SIGNATURE is just an internal implementation detail and imparts no
> >constraints on the syntax or semantics of signatures. It just happens to
> >be the thing that will need modifying if someone wants to alter the syntax
> >or semantics of signatures.
> 
> I disagree about it being purely internal, an issue which is discussed
> more below.  The op type is somewhat semantically visible.  It's not
> correct that it imparts no semantic constraints, as you implemented that
> 2^15 limit and imposed that on signatures per se, though as you pointed
> out that's fixable.  But that such an issue arises, and particularly the
> inflexibility shown by your approach of applying the implementation limit
> to signatures, shows that this is coupling the syntax and the semantics
> too tightly.  Ops are a semantic matter.  Putting arguments into lexical
> variables is a semantic matter.  But signatures are syntactic.

I think you are conflating two different issues. There is the issue of
whether OP_SIGNATURE is in principle wrong and semantically partially
visible, or whether specific optimisations within the current
implementation of OP_SIGNATURE (i.e. 32K var limit; needs to be the first
op in the sub) are suitable tradeoffs. I could cook up a patch that
removes both those constraints in about 15 mins flat if need be.

So, ignoring specific optimisations which could be easily reverted,
I don't think OP_SIGNATURE is semantically visible.

> The issue around applying the limits of the signature op type to
> signature syntax also shows that the signature op is, in semantic space,
> uncomfortably distant from the other op types.  When the signature op's
> limits preclude some small semantic change, it really needs to be easy
> to fall back to other op types that grant one the freedom to implement
> the desired semantic directly by composing simple ops.  Sure, when the
> change we want to make is expressed by the core signature syntax we
> can always add the new feature to the core signature op, but that's
> not the kind of situation I'm thinking of.  I'm concerned about ops
> being generated by non-core modules, which are by nature constrained by
> the semantics of the op types supported by whatever version of the core
> they're running on.  This is a concern for signature plugins of all kinds
> (also discussed below).

This I don't understand. There is nothing to stop a hypothetical plugin
either emitting a series of "plain" ops instead of an OP_SIGNATURE, or
later replacing the OP_SIGNATURE (depending on what point in the
compilation process it is called at).

> >While I'm not always opposed in principle to things being pluggable, I
> >would attach a much higher priority to reducing the overhead of function
> >calls in perl as much as possible. This is an area where perl is
> >notoriously slow.
> 
> I disagree.  Extensibility is vitally important to the future of the
> language, because it is the feature that makes all other features
> possible.  Performance is certainly a legitimate concern, but it's not
> an enabling issue in the way that extensibility is.  Sub call overhead
> certainly is a problem for Perl, but reducing it shouldn't come at the
> expense of making some pluggability effectively impossible.  Whatever
> level of overhead is a problem today is only a temporary problem anyway,
> as ever-cheaper hardware erodes the expense.

If people aren't concerned with performance (due to that wonderful
ever-cheaper hardware (which has been stuck at 3Ghz for several years
now)), while wanting maximum flexibility, plugability etc, then again I
suggest that we point them to perl6. 

> >But as far as I can see, there's nothing in my implementation that stops
> >making signatures pluggable. Just add at the start of
> >Perl_parse_subsignature() something like:
> >
> >    if (PL_signature_hook) {
> >        call_sv(....);
> >        return (the op subtree created above);
> >    } 
> 
> This is not the kind of pluggability that I have in mind.  A single hook
> to replace the entirety of signature parsing would be quite insipid;
> we can already get pretty much that effect by the kinds of modules
> that are already on CPAN providing their own signature facilities.
> The interesting kind of pluggability is to provide custom parsing of
> a signature *item*, integrated with other items parsed by standard
> and custom means, in a signature that at the top level is managed by
> the core signature parser.  For example, core signatures notably lack
> type constraints, because the core language has no real type system,
> so we'd like modules that provide their own type systems (such as Moose)
> to be able to plug in type constraint signature items, like:
> 
> 	use feature "signatures";
> 	use MooseX::Types::Signature qw(type);
> 	sub frobnicate (
> 		$self,
> 		type Int $how_hard,
> 		type Bool $with_lube = 0,
> 	) { ... }
> 
> This kind of extensibility requires that interfaces between signature
> items be reasonably clean; each item parser needs to be able to supply
> its part of the signature behaviour in a self-contained manner.  With the
> present system of each item generating ops separately, it is easy to
> have a plugged-in signature item parser return ops for a single item.
> (There's some additional data flow required to manage parameter indices,
> and argument count constraints, but that doesn't require much more
> protocol.)  If items are instead expected to contribute parts of a single
> signature op, that's a much messier protocol right away.

There's nothing that says an OP_SIGNATURE has to be generated. That is
just a current implementation detail.

Perl_parse_subsignature() could do something as simple as: at the
beginning, check whether hooks are enabled, and if so, generate a "Zefram"
optree, while calling those hooks at appropriate points;  if not, then
call the default parsing code which generates an OP_SIGNATURE.

Think of it conceptually that Perl_parse_subsignature() produces an
optree, which under certain (but very common) situations can then optimised
into a single OP_SIGNATURE op. It just so happens that the current
implementation, knowing that at the moment the optree can *always* be
reduced, just skips the optree generation and creates the OP_SIGNATURE
directly, because from both programmer effort and compile-resource
perspectives, that's a lot easier to do.

But this is all academic until someone actually comes forward with a
signature plugability proposal.

> >Such an approach would leave signatures always significantly slower
> >than a comparable my(...)=@_. You would end up executing one or more ops
> >for each param,
> 
> No, that doesn't follow.  My approach only requires that you would
> *build* ops separately for each parameter, not that they remain separate
> for execution.  The very part of my message that you quoted to give
> this response to referred to the possibility of combining multiple
> argument assignments into one op.  padrange is both directly applicable
> to such situations and a good example of how multi-op structures can be
> opportunistically collapsed into single ops.

Well, the bit that I quoted,

    For example, the op sequence corresponding to "@_ >= N+1 ? $_[N] :" is
    frequently generated by signature code, and could be collapsed into a
    single op

is an example of an op that *can't* combine multiple argument assignments.
Assuming such an op existed (lets call it push_arg say, and assume that
N is stored in the op itself), then sub f ($a =10, $b = $a) {}
would have to be compiled to something like

    ... elided ops to check arg size limits ...
    pushmark
    push_arg[1](other->A)
    const[IV 10]
 A: push_arg[2](other->B)
    padsv[$x]
 B: padrange[$a; $b]
    aassign

which is going to be a *lot* slower than a single OP_SIGNATURE op.

If you can give me a concrete example of a set of one or more hypothetical
new ops and how an arg assignment like the one above would be compiled
using those ops then I will reconsider this issue, but for now I will have
to assume that OP_SIGNATURE is *much* more efficient.

Also, in terms of the criticism that OP_SIGNATURE isn't general purpose,
I think you'll find that the "push_arg" op is highly specific too; I think
it would be fairly rare to find perl code of the form '@_ >= N+1 ? $_[N] :
...' in the wild outside of code generated by your version of
parse_subsignature(). Indeed http://grep.cpan.me/ shows no matches for
either of these:

    @_\s*>\d+\s*?\$_\[\d+\]\s*:
    @_\s*>=\d+\s*?\$_\[\d+\]\s*:

> >Furthermore at its heart, the op_aux array of the OP_SIGNATURE is just a
> >compact representation of the sub's signature, making it relatively easy
> >for other code at the C or Perl level to extract out that information and
> >do whatever it likes with it.
> 
> Once again, this is treating "the sub's signature" as some kind of
> meaningful metadata, which it's not.

No, I wasn't referring to introspection, I was referring things like
plugins that want to mess with the optree. Such a plugin could
hypothetically use the op_aux data to recreate a set of "normal" ops to
replace or augment the OP_SIGNATURE if required.

> >In my implementation, the way a sub is deparsed matches how it was
> >written.
> 
> That is not a legitimate goal of a deparser.

Yes it is!  While it's a first priority that the output must be
functionally correct, Deparse still tries, where possible, to make the
output pretty and human readable, and to roughly match the input (where
enough information is left in the op tree to make that latter possible).

Also there is the issue of not losing performance on a round trip. If this:

    sub f ($a, $b, $c = 1) {}

gets deparsed to:

    die sprintf("Too many arguments for subroutine at %s line %d.\n", (caller)[1, 2]) unless @_ <= 3;
    die sprintf("Too few arguments for subroutine at %s line %d.\n", (caller)[1, 2]) unless @_ >= 2;
    my $a = $_[0];
    my $b = $_[1];
    my $c = @_ >= 3 ? $_[2] : 1;
    ();

then when it is re-compiled, the new sub will be a lot slower than the old
one.

Or to put it another way, if the  Deparser *can* (correctly) deparse
something as a signature rather than as lots of random statements, then it
*should*.

The catch is of course the word 'correct', where I will concede my
current Deparse.pm mods are inadequate. See below.

> >                   If a whole file is being deparsed, then Deparse will
> >(modulo bugs) have already emitted the correct 'use feature "signatures".
> 
> "Modulo bugs" is most definitely operative here:
> 
> 	$ perl5.20.0 -MO=Deparse -e 'use feature "signatures"; sub foo { 123 }'
> 	sub foo {
> 	    use feature 'signatures';
> 	    123;
> 	}
> 	-e syntax OK
> 
> With no state ops before the sub, it does emit the pragma too late.
> This is, of course, fixable.

Yes, this needs fixing (and I intend to do so), otherwise a sub that's
output with a signature won't be compilable.

I'm minded to deparse individual subs when use feature "signatures" is in
scope as:

    {
        use feature "signatures";
        sub (...) { }
    }

i.e. to wrap each sub in an extra scope, so that code (like Data::Dumper)
that deparses individual subs rather than whole files, will see the right
environment.

When deparsing a whole file, it will try to avoid the extra scope where
possible.

Can anyone think why that wouldn't work?

> In your version, the coupling is all wrong: you've coupled the use of
> signature syntax in the output to its use in the parser input, when it
> really needs to be coupled to the lexical state of the output.

Agreed.

> >> It's also incorrect
> >> to tie the change in prototype syntax (short syntax vs attribute) to the
> >> use of signature syntax:
> >
> >This is subordinate to the previous issue: if for whatever reason
> >Deparse  (or Deparse's caller) has chosen to display the sub using the
> >signature syntax, then it *has* to change the prototype output from (...)
> >to :proto(...).
> 
> No, you're making exactly the same mistake again here.  This is the wrong
> condition on the use of :proto.  Short prototype syntax isn't disabled
> by the presence of a signature on a particular sub, it's disabled by the
> signature feature flag being on in the lexical context.  The deparser has
> to show the prototype in :proto form *if its output has enabled signature
> syntax*, even if its output is not actually using signature syntax.

Agreed.

> >                                                 I can see a legitimate
> >use for it in, for example a serialiser, which can decide to what extent
> >it wants to deparse a sub as closely as possible to the original form
> 
> A deparser is looking at the ops anyway, so will see the real
> implementation of the signature.  I would have much less (but still
> not no) objection to a flag on an op advising that it's derived from
> a signature.  A CVf is making the information readily visible to code
> that *isn't* walking the ops, which is where the problem really lies.

As I said before, I'm not particularly bothered, so I'll remove it for
now. But I will have no compunction re-adding it later if it turns out it
makes Deparse.pm's life easier.

> >This is something that annoys me. Taken to its logical conclusion, the
> >entirety of the OP_FOO and pp_foo() system could be regarded as public
> >API, and we should never add a new OP,
> 
> I do not take this extreme straw-man position, for precisely the
> practical reasons you envision.  I did not advocate that we should never
> add optimised op types.  I said that we should be confident that the
> things we add are the right way to go.  We need a reasonable degree of
> API stability, not total stagnation.

Ok, agreed - that was a strawman.

But in this case, I think the performance benefit of OP_SIGNATURE is
worth the price of violating API stability.

In conclusion:

* I concede that Deparse needs fixing (but it was already broken :-).
* I intend for subs to deparse with a signature and :proto() IFF
  feature 'signatures' is in scope.
* I still think that OP_SIGNATURE is a very important optimisation that
  should be added to core.
* I think that OP_SIGNATURE is an internal implementation detail and
  anywhere in my code that violates that assumption (chiefly Deparse)
  needs fixing.
* SVf_HASSIG will go.
* A specific optimisation requires that no ops be executed prior to
  pp_signature() being called which could initialise the sig params, or
  leave them uncleared from a previous call. I think this is a reasonable
  constraint on sub params and am minded to leave this optimisation in. If
  at some point someone can provide a persuasive use case that would
  violate this constraint, then the optimisation could then be dropped.
* Another specific optimisation restricts the number of params a sub can
  have to 32767. I can easily increase this limit to 2**31-1 at the
  expense of requiring an extra U32 of storage in the op_aux of each
  OP_SIGNATURE on 32-bit platforms. I am minded to do this.





-- 
No matter how many dust sheets you use, you will get paint on the carpet.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About