develooper Front page | perl.perl5.porters | Postings from November 2019

Aliasing and Read-only variables

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
November 28, 2019 17:04
Subject:
Aliasing and Read-only variables
Message ID:
20191128170401.GE3573@iabyn.com
=head2 Synopsis:

    sub foo {
        # the lexical vars $a, @b and %c are aliased to the things pointed
        # to by the reference values passed as the first three arguments:

        \$a \@b, \%c,

        # $d is aliased to the fourth arg:

        *$d,

        # $e is aliased to the fifth arg,  but at compile time, any use
        # of $e in lvalue context within this lexical scope is a compile
        # time error:

        *$e :ro,

        # the elements of @f are aliased to any remaining args,
        # i.e. a slurpy with aliasing ...:

        *@f

        # .. or a slurpy hash; every second remaining arg is aliased to
        # the hash's values:

        *%g
    ) { ... }


=head2 :ro

Before discussing the general issues around aliasing of signature
parameters, I want to make a specific proposal concerning read-only (RO)
variables. This can apply to all lexical variables, not just signature
parameters, although it would be particularly useful for the latter. This
can then inform the aliasing proposals which follow.

In Perl 5, it's hard to efficiently create a read-only alias of another
SV.  You can't mark the original itself as SVt_READONLY, as the original
should still be writeable. Instead you have to create some sort of
read-only proxy SV, which is slow and expensive.

I propose a much more restrictive (but hopefully still useful) RO regime;
rather than making the variable itself RO, instead at compile time, ban
lvalue *uses* of the variable in any code its lexical scope. More
precisely, for the $x in this example:

    {
        my $x :ro = ...;
        ...;
    }

then within the lexical scope of $x (from the point it's been introduced
onwards), any lvalue-context usage of that variable is a compile-time
error. Note that as well as forbidding all the obvious stuff like $x++,
all the following would also be compile-time errors:

    foo($x);
    for ($x, ...) { ...}
    $y = \$x;
    sub foo: lvalue { ....; return $x }

since they all use $x in (potentially) lvalue context. This is very
restrictive, but at least since the check is at compile time you will find
this out immediately, rather than discovering at run-time months later.

For lexical arrays and hashes, both container and element use are
forbidden in lvalue context:

    my @a :ro = ...;
    my %h :ro = ...;

    # all these are illegal:
    \@a;
    \%h;
    $a[0] = 1;
    foo($h{bar});
    for ($a[$i], $h{$j}) { ... }

Note that a :ro variable could still be modified by, for example, being
tied and the FETCH() modifying itself (think of a tied fetch counter for
example). But STORE() should never be called.

Note that a :ro variable will still be subject to the usual upgrades that
SVs go through:

    sub f(*$x :ro) { $x + 1 } # $x is an alias - see below
    my $s = "123";            # $s is an SvPV
    f($s);                    # $s is now an SvPVIV

In terms of implementation, the basic principle is that the OPf_MOD flag
being set on an OP_PADSV op or similar for such a lexical variable will
croak.

=head2 Aliasing

It would be nice sometimes for a parameter to be an alias of the argument
rather than a copy, for convenience and/or performance. For example
aliasing an array parameter to a passed array ref argument or, less
commonly, providing access to the caller's value so that can be altered.

First, lets look at how Perl 6 does it. It uses traits, which allow
the following permutations for parameter variables:

     $x is readonly # the default - an alias, but not modifiable
     $x is rw       # direct alias: modifying $x modifies the argument
     $x is copy     # like current Perl 5 signature parameters
     $x is raw      # (I don't understand this one)

     Note that rw on a slurpy parameter is "reserved for future use by
     language designers": (*@a is rw).

I'm not proposing that we use this syntax, but it gives an idea of what
we need to provide in another way.

Note that read-onlyness is orthogonal to aliasing/copying; in principle
you can have all four of these permutations:

    copy   rw  # like P6's 'is copy'
    copy   ro  # no P6 equivalent, and not very useful?
    alias  rw  # like P6's 'is rw'
    alias  ro  # like P6's 'is ro'

I propose that we use the new :ro attribute (which I discussed above) to
indicate readonly-ness, and use the syntax discussed below to enable
aliasing rather than copying. Each can be selected independently of each
other.

There have been two significant RT tickets that discussed aliasing for
signature parameters.

The first was from May 2016:

    RT #128242: Aliasing via sub signature

and the second by Zefram in Nov 2017:

    RT #132472: aliasing in signatures

which was really a summation of the ideas from the first ticket firmed up
into a coherent proposal.

I am in full agreement with the analysis provided by Zefram, and think we
should use his proposal as-is bar some bike-shedding about the actual
syntax.

Zefram pointed out that there are really two distinct types of aliasing
we might wish to do.

=head3 Reference Aliasing

The first type, which I will call "reference aliasing", expects the
argument to be a I<reference> to something, and the signature processing
code first dereferences that argument (with dereference overloading
honoured) and aliases the parameter to the resulting container - croaking
if it's not a suitable reference. For example:

    sub foo(\$x, \@a, \%h, $other, $stuff) { ... }

    foo(\$X, [], \%H, 1, 2);

Then within the body of foo(), $x is an alias for $X, @a for the anonymous
array, and %h for %H. This type of aliasing is more useful for array and
hash references, but scalars are supported for completeness. Note that @a
and %h are *not* slurpy parameters; they consume a single argument, and more
parameters can follow them.

Any default value expressions must return a reference to a suitable
container type, or croak: e.g.:

    sub foo(\$x = \1, \@y = [], \%z = \%::Defaults) { ... }

As an aside, in the "Miscellaneous suggestions" thread, I propose a
'default default' for optional expressions, where for example ($x?) is
short for ($x=undef), \@a? is short for \@a=[] and %h? is short for
\%h={}.

The use of \ to prefix the parameter name seems uncontroversial, since
it mimics the existing lexical variable aliasing syntax:

    my \$x = \$X;
    my \@a = \@A;

Placeholder parameters would check that the argument is a suitable
reference, then throw it away:

    sub foo(\@, \%, ...)

Similarly, default expressions for placeholder parameters will still be
evaluated and checked before being thrown away. The optional placeholder,
'\@=' checks the argument, if present, then skips it.

=head3 Direct Aliasing

The second form of aliasing which should be supported can be thought of as
'direct': it doesn't use references, and it aliases the *elements* of
arrays and hashes rather than the containers. Zefram proposed using a
trailing \ to indicate this, but I'm not keen on that; instead for now
I'll use a '*' prefix, which I'll try to justify later. Direct aliasing
can be applied to both scalar parameters and to slurpy parameters (of
which there can of course be only one, located at the end of each
signature). For example, given:

    sub foo(*$a, *$b, *@c     ) { ... }
        foo( $A,  $B,  $C, @D);

then within the body of foo(),

    $a    is an alias of $A;
    $b    is an alias of $B;
    $c[0] is an alias of $C;
    $c[1] is an alias of $D[0];
    $c[2] is an alias of $D[1];
    etc

It works similarly for a *%h hash slurpy, except that only the hash's
*values* (and not keys) are aliased. Given:

    sub foo(*%h) { ... }
    foo($k1, $v1, $k2, $v2);

then within the body of foo(),

    $h{$k1} is an alias of $v1;
    $h{$k2} is an alias of $v2;

while the keys of the hash are just plain strings as usual.

Placeholder direct alias parameters are forbidden. There's no (*$, *@).

Default values are allowed for scalar direct aliased parameters, but not for
direct aliased slurpies (since slurpies aren't allowed defaults).

Direct aliasing would be most useful with :ro. It allows the performance
gain of not copying, but with safety. So for example

    sub foo { $foo[$_[0]] }

becomes

    sub foo (*$i :ro) { $foo[$i] }

Now for the bikeshedding about what syntax to use for direct aliasing.
Zefram tentatively proposed using $x\, whereas I propose *$x.

Here's why I prefer my suggestion:

* By having a single syntactical slot to indicate aliasing, that slot can
  populated with only one of two chars to indicate which type of aliasing
  is wanted (\$x and *$x). With two slots there's ambiguity: what does
  \$x\ mean?

* A trailing \ implies some sort of (de)referencing, but there isn't any
  for direct aliasing. (Conversely, \$x is good for reference aliasing
  because the \ correctly implies that a reference is involved.)

* A trailing \ could potentially interfere with any new syntax which
  follows the parameter name, or could look like its trying to escape it.

I'm happy to consider other character candidates for the direct aliasing
syntactic slot, but I like '*' because:

* It has a loose mnemonic association with aliasing via typeglob
  assignment: *foo = ....;.

* It has a loose association with Perl 6's array flattening syntax, *@a,
  which flattens the elements of the array rather than referring to the
  container itself; by analogy, whereas \@a aliases the array's container,
  *@a aliases the elements of @a.

The alias character should be considered syntax rather than being part of
the sigil, and in particular, whitespace should be allowed: (\ @x, * @a).

=head3 default aliasing behaviour

There is the question of default behaviour. In Perl 5 currently the
default is to copy, while in Perl 6 the default is a read-only alias.
I think changing  Perl 5 to match Perl 6 is probably too big a step. We'd
also have to introduce a new attribute, :copy say, which indicates that we
*don't* want :ro.

A second possible default is what to do in the explicit presence of :ro.
For parameters, this attribute is most useful for aliasing, so one
possibility is to make aliasing the default in the presence of :ro. This
means that rather than typing (*$x :ro) you could just type ($x :ro).
However, the problems with that are:

1) it's ambiguous what sort of aliasing is being automatically enabled
   (\ or *)
2) we'd need some extra syntax to be able to turn off the aliasing.

So all in all, I think we should leave things as they are:

    ($x)      read-write copy
    (*$x :ro) Perl 6-style read-only alias.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About