develooper Front page | perl.perl5.porters | Postings from February 2015


Thread Previous | Thread Next
February 25, 2015 21:00
Message ID:
Dave Mitchell wrote:
>On Tue, Feb 24, 2015 at 04:05:34PM +0000, Zefram wrote:
>>                      Specifically, I dislike (a) that its behaviour
>> is very specific and a priori arbitrary, being precisely tailored to
>> what signatures do;
>What's the matter with that?

It's a messy design.  The op types are, for the most part, reasonable
programming primitives, with simple specifications.  The signature op
type is entirely contrary to that.  Sure, we make some exceptions for
performance, such as padrange, but we have to strike a balance between
performance and API cleanliness.  padrange is a decent tradeoff: a good
bit of performance win for a small bit of API mess.  The signature op
is way too far at the vomit-over-the-API end of the spectrum.

>                               Just like we have a special op,
>OP_ENTERSUB, that is precisely tailored to performing a perl-specific
>subroutine call.

entersub has a lot of Perl-specific internals, but its specification
(little more than "call referenced subroutine with arguments from the
stack") is straightforward and mostly an inevitable programming primitive.
It's the interface that matters, not the implementation.

>> (b) that it relies on only being called at the very beginning of a sub;
>Again, what's the matter with that?

It would limit the manipulability of the op.  The op tree structure
generally permits us to shift ops around quite freely, because ops work
largely independent of context.  (They're specific to the particular sub
being compiled, but can be moved freely within the sub.)  Your signature
op would be an ugly exception.  It wouldn't be quite so bad if op mungers
could readily turn the signature op into a context-independent form,
but your design provides no such escape hatch short of the op munger
fully understanding the guts of the signature op and expanding it into
the large number of ops that signatures currently generate.

>> (c) that it is only generated by syntactic signature parsing (and the
>> non-default "my(...)=@_" special case);
>So it's an op that's only generated when args are unambiguously being
>assigned to params. What's the matter with that?

Arguments are unambiguously assigned to lexical variables by plenty of
statements other than signatures.  (Your argument here accords syntactic
signatures an unwarranted semantic significance.)  I'm all for having
ops optimised to help with argument handling, but they should apply to
argument handling expressed either way.  They should be aimed at the
semantic activity of extracting arguments, not at one particular syntax
that can be used to express that activity.

Having the new op only generated in such specific cases is also another
blow against its general usability, and making it such a special case
invites bugs based on assuming too much about its specificity.  To avoid
bugs, new op types should be as unspecial as possible.

>> (d) that the patch makes syntactic signature parsing reliant on this
>> special op;
>Again, what's the matter with that?
>> and (e) that deparsing signature syntax is tied to the special op.
>Again, what's the matter with that?

I did expand on these points in my message, and I'll address your
responses to my expansion below.

>OP_SIGNATURE is just an internal implementation detail and imparts no
>constraints on the syntax or semantics of signatures. It just happens to
>be the thing that will need modifying if someone wants to alter the syntax
>or semantics of signatures.

I disagree about it being purely internal, an issue which is discussed
more below.  The op type is somewhat semantically visible.  It's not
correct that it imparts no semantic constraints, as you implemented that
2^15 limit and imposed that on signatures per se, though as you pointed
out that's fixable.  But that such an issue arises, and particularly the
inflexibility shown by your approach of applying the implementation limit
to signatures, shows that this is coupling the syntax and the semantics
too tightly.  Ops are a semantic matter.  Putting arguments into lexical
variables is a semantic matter.  But signatures are syntactic.

The issue around applying the limits of the signature op type to
signature syntax also shows that the signature op is, in semantic space,
uncomfortably distant from the other op types.  When the signature op's
limits preclude some small semantic change, it really needs to be easy
to fall back to other op types that grant one the freedom to implement
the desired semantic directly by composing simple ops.  Sure, when the
change we want to make is expressed by the core signature syntax we
can always add the new feature to the core signature op, but that's
not the kind of situation I'm thinking of.  I'm concerned about ops
being generated by non-core modules, which are by nature constrained by
the semantics of the op types supported by whatever version of the core
they're running on.  This is a concern for signature plugins of all kinds
(also discussed below).

>                            In the same way that changes to the syntax
>and semantics of open() (and there have been many over the years)
>require changes in pp_open() etc.

Not a good analogy.  open() acts much as a function, which has been
compatibly extended over time by accepting more arguments and allowing
arguments to take on more forms than before.  The syntax by which those
arguments are specified in Perl code hasn't changed at all, and the op
likewise is just accepting more arguments on the stack.

>While I'm not always opposed in principle to things being pluggable, I
>would attach a much higher priority to reducing the overhead of function
>calls in perl as much as possible. This is an area where perl is
>notoriously slow.

I disagree.  Extensibility is vitally important to the future of the
language, because it is the feature that makes all other features
possible.  Performance is certainly a legitimate concern, but it's not
an enabling issue in the way that extensibility is.  Sub call overhead
certainly is a problem for Perl, but reducing it shouldn't come at the
expense of making some pluggability effectively impossible.  Whatever
level of overhead is a problem today is only a temporary problem anyway,
as ever-cheaper hardware erodes the expense.

>But as far as I can see, there's nothing in my implementation that stops
>making signatures pluggable. Just add at the start of
>Perl_parse_subsignature() something like:
>    if (PL_signature_hook) {
>        call_sv(....);
>        return (the op subtree created above);
>    } 

This is not the kind of pluggability that I have in mind.  A single hook
to replace the entirety of signature parsing would be quite insipid;
we can already get pretty much that effect by the kinds of modules
that are already on CPAN providing their own signature facilities.
The interesting kind of pluggability is to provide custom parsing of
a signature *item*, integrated with other items parsed by standard
and custom means, in a signature that at the top level is managed by
the core signature parser.  For example, core signatures notably lack
type constraints, because the core language has no real type system,
so we'd like modules that provide their own type systems (such as Moose)
to be able to plug in type constraint signature items, like:

	use feature "signatures";
	use MooseX::Types::Signature qw(type);
	sub frobnicate (
		type Int $how_hard,
		type Bool $with_lube = 0,
	) { ... }

This kind of extensibility requires that interfaces between signature
items be reasonably clean; each item parser needs to be able to supply
its part of the signature behaviour in a self-contained manner.  With the
present system of each item generating ops separately, it is easy to
have a plugged-in signature item parser return ops for a single item.
(There's some additional data flow required to manage parameter indices,
and argument count constraints, but that doesn't require much more
protocol.)  If items are instead expected to contribute parts of a single
signature op, that's a much messier protocol right away.

>                                                                I went
>with the more efficient restriction because I couldn't conceive that
>anyone would ever require more than 32767 named params.

I can imagine generated code hitting this limit.

>                                                        Note that we
>already have much more onerous limitations in the regex engine, with
>/FOO{1,N}/ not working correctly for various classes of FOO when N>32757.

Yes, that's a problem too, and needs to be fixed.  Not an excuse to
introduce another limitation of that nature.

>Such an approach would leave signatures always significantly slower
>than a comparable my(...)=@_. You would end up executing one or more ops
>for each param,

No, that doesn't follow.  My approach only requires that you would
*build* ops separately for each parameter, not that they remain separate
for execution.  The very part of my message that you quoted to give
this response to referred to the possibility of combining multiple
argument assignments into one op.  padrange is both directly applicable
to such situations and a good example of how multi-op structures can be
opportunistically collapsed into single ops.

>Furthermore at its heart, the op_aux array of the OP_SIGNATURE is just a
>compact representation of the sub's signature, making it relatively easy
>for other code at the C or Perl level to extract out that information and
>do whatever it likes with it.

Once again, this is treating "the sub's signature" as some kind of
meaningful metadata, which it's not.

>In my implementation, the way a sub is deparsed matches how it was

That is not a legitimate goal of a deparser.

>                   If a whole file is being deparsed, then Deparse will
>(modulo bugs) have already emitted the correct 'use feature "signatures".

"Modulo bugs" is most definitely operative here:

	$ perl5.20.0 -MO=Deparse -e 'use feature "signatures"; sub foo { 123 }'
	sub foo {
	    use feature 'signatures';
	-e syntax OK

With no state ops before the sub, it does emit the pragma too late.
This is, of course, fixable.

That's just an example of a wider principle.  In general, as a matter
of robustness, it is more important that the deparser's output be
consistent with itself than that it match the parser's original input.
In your version, the coupling is all wrong: you've coupled the use of
signature syntax in the output to its use in the parser input, when it
really needs to be coupled to the lexical state of the output.

>> It's also incorrect
>> to tie the change in prototype syntax (short syntax vs attribute) to the
>> use of signature syntax:
>This is subordinate to the previous issue: if for whatever reason
>Deparse  (or Deparse's caller) has chosen to display the sub using the
>signature syntax, then it *has* to change the prototype output from (...)
>to :proto(...).

No, you're making exactly the same mistake again here.  This is the wrong
condition on the use of :proto.  Short prototype syntax isn't disabled
by the presence of a signature on a particular sub, it's disabled by the
signature feature flag being on in the lexical context.  The deparser has
to show the prototype in :proto form *if its output has enabled signature
syntax*, even if its output is not actually using signature syntax.

>I think you'll find that virtually *every* Cvf_* flag is exposing an
>internal implementation detail.

They're a mixture.  Some are the storage for internal data that callers
shouldn't be looking at (ISXSUB, CLONED): they're justified because the
data has to be stored somewhere.  Some are relevant to callers (LVALUE,
CONST).  Some have some external relevance other than to ordinary callers
(NODEBUG, AUTOLOAD).  None so far is advertising an internal detail
that's actually determined elsewhere.

>                                                 I can see a legitimate
>use for it in, for example a serialiser, which can decide to what extent
>it wants to deparse a sub as closely as possible to the original form

A deparser is looking at the ops anyway, so will see the real
implementation of the signature.  I would have much less (but still
not no) objection to a flag on an op advising that it's derived from
a signature.  A CVf is making the information readily visible to code
that *isn't* walking the ops, which is where the problem really lies.

>This is something that annoys me. Taken to its logical conclusion, the
>entirety of the OP_FOO and pp_foo() system could be regarded as public
>API, and we should never add a new OP,

I do not take this extreme straw-man position, for precisely the
practical reasons you envision.  I did not advocate that we should never
add optimised op types.  I said that we should be confident that the
things we add are the right way to go.  We need a reasonable degree of
API stability, not total stagnation.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About