develooper Front page | perl.perl5.porters | Postings from November 2019

Type and Value Constraints and Coercions

From:
Dave Mitchell
Date:
November 28, 2019 17:05
Subject:
Type and Value Constraints and Coercions
Message ID:
20191128170442.GF3573@iabyn.com
[
This proposal is the one where I am most outside of my comfort zone.
It attempts to supply useful features, and to allow expandable hooks so
that systems like Moose which have their own constraint systems can make
use of it, but I'm not intimately acquainted with such systems, and there
may be better ways to do this.

It's something I probably wouldn't use myself much, and were I to
implement it, its probably the thing I would do last or nearly last.
]

=head2 Synopsis:

    sub f(
            $self isa Foo::Bar,         # croak unless $self->isa('Foo::Bar');
            $foo  isa Foo::Bar?,        # croak unless undef or of that class
            $a!,                        # croak unless $a is defined
            $b    is  Int,              # croak if $b not int-like
            $c    is  Int?,             # croak unless undefined or int-like
            $d    is PositiveInt,       # user-defined type
            $e    is Int where $_ >= 1, # multiple constraints
            $f    is \@,                # croak unless  array ref
            $aref as ref ? $_ : [ $_ ]  # coercions: maybe modify the param
    ) { ...};


=head2 Background

It seems that people express a desire to be able to write something like:

    sub f (Int $x) { ... }

as a shorthand for for something like:

    sub f ($x)  {  croak unless
                             defined $x
                          && !ref($x)
                          && $x =~ /\A-?[0-9]+\Z/;
                    ....;
                }

Similarly people want

    sub f (Some::Arbitrary::Class $x) { ... }

as a shorthand for

    sub f ($x)  {  croak unless
                             defined $x
                          && ref($x)
                          && $x->isa(Some::Arbitrary::Class);
                    ...;
                }

Furthermore, there are also Perl 6 generic constraints:

    sub f ($x where * < 10*$y) { ... }

which (in Perl 6 at least) can be used either as-is, or as part of a
multi-method dispatch - which selects whichever version of f() best
matches its (constant) argument constraints.

In any such a scheme, there would have to be support for built-in types
(such as Int) plus the ability to extend the type system with
user-defined (or pragma-defined) types (e.g. PositiveInt say).

So, what's the best way of giving the public what they want?

First, we should be very clear that's what is contained within this
proposal is an argument-checking system, not a type system for variables.
It will guarantee that, at the start of execution of the main body of the
sub, the parameter variable has a value that meets certain constraints,
possibly via some initial modification; and if this is not possible, the
sub will instead have croaked at that point.

It *doesn't* mean that the sub's caller is any way constrained at compile
time as to what arguments it can pass. Unlike prototypes, which affect
caller compilation, signatures are at heart just efficient syntactic sugar
for code at the start of the body of a sub which checks and manipulates
the contents of @_ while binding them to local lexical variables.

Similarly, a constraint doesn't constrain the value of a lexical parameter
variable later in the body of the sub. For example:

    sub f (Int $x) {
        ...;         # $x contains a valid integer value at this point
        $x = [];     # legal, even though not an Int value
    }

    f( [] );         # compiles ok; only croaks when f() is called.

This is in contrast to a "real" type system. For example, the existing
'my Dog $spot' syntax can be made (in conjunction with 'use fields') to
croak on invalid hash keys:

    package Dog;
    use fields qw(nose tail);
    my Dog $spot = {};
    $spot->{fins} = 1; # compile time error

It might also, in some hypothetical future version of perl, support:

    my Int $x = 1;
    $x = []; # error

A "real" type system might also allow optimisations: e.g. storing the
value of $x directly as an integer rather than an SV, and planting special
versions of arithmetic ops which deal directly with an int on the stack
rather than an SV.

So, given that a constraint type system and a "real" type system are two
separate things (unless someone smarter than me can can suggest a way of
unifying them), I think that they should be kept syntactically separate.
In particular, we shouldn't use a type prefix before the variable name to
specify a constraint; that should be reserved for a hypothetical future
type system.

=head2 Main Proposal

Instead, what I propose is a special postfix '!' symbol, plus four
parameter postfix keywords (akin to Perl 6 traits): C<where>, C<as>, C<is>
and C<isa>. These can be applied in any order, and more than once, to each
parameter. Syntactically, they are similar to statement modifiers, except
that they can be stacked. For a given parameter, they come after all of
the parameter's other syntax (including default values).  They are
processed against the lexical parameter, after any binding of arguments or
default value. In detail:

=over

=item $param!

This is a special short cut for what I assume is a common requirement.
It is equivalent to

    $param where defined $_

Unlike the other constraints, the exclamation mark goes directly after
the parameter name:

    $param! :shared = 0

But like the other constraints, it is tested I<after> any default
value has been applied.

=item where <boolean expression>

This temporarily aliases $_ to the current parameter, and croaks if
the expression doesn't return a true value: e.g.

    sub f ($x = 0 where defined && $_ < 10, ...) { ... }

=item isa <Class::Name>

    sub f ($x isa Class::name)

is roughly shorthand for

    sub f ($x where    defined $_
                    && ref($_)
                    && $_->isa('Class:name'))

Although if Paul Evan's 'isa' infix keyword is accepted into core, then
the signature 'isa' trait should become exactly shorthand for:

    sub f ($x where $_ isa Class::Name) { ... }

and any rules regarding whether the class name is quoted and/or
has a trailing '::' should be the same.

Note that it uses perl package/class names, not constraint type names.

A class name followed by '?' indicates that an undefined value is
allowed:

    sub f ($x isa Class::name?)

is effectively shorthand for

    sub f ($x where    !defined $x
                    || (   ref($x)
                        && $x->isa('Class:name')))


=item as <coercion expression>

This temporarily aliases $_ to the current parameter, evaluates the
expression, and assigns the result to $_, which may cause the parameter
lexical variable to be updated: e.g.

    sub f ($array_ref as (ref ? $_ : [ $_ ]), ...) { ... }

So
    ($x as expr, ...))

can be thought of as shorthand for

    ($x where (($_= expr, 1), ...)

In practice I would expect 'as' to be used mostly by pragma writers
to define custom types for use by 'is' as described below; 'as' itself
would appear less frequently actually in signatures.

=item is <constraint-type-name>

This implements the functionality desired by the hypothetical 'Int $x'
example above to check whether the parameter's value satisfies the
named constraint, possibly coercing it too. It supports using the
hints mechanism to allow pragmata to add new constraint types in
addition to those already built in. For example:

    # built-in type:
    sub foo ($x is Int ) { ... }
    # roughly equivalent to: die unless defined && /^-?[0-9]+$/

    # user-defined type:
    sub foo ($x is PositiveInt) { ... }
    # roughly equivalent to: ($x is Int where $x >= 0)

See below for details of how custom constraint types can be created.

Like 'isa', 'is' type names can be followed by '?', indicating that an
undefined value is also allowed. If the argument is undefined, the
type check is skipped. (So a bit like Moose's MaybeRef etc.)

    sub foo ($x is Int?) { ... }

Type names as used by 'is' occupy a different namespace than perl
packages and classes, and in particular they can't include '::'
in their name, so they are less likely to be confused with typical
Foo::Bar package names.

Note that in an earlier draft of this proposal I used the 'isa'
trait to handle both 'isa' and 'is'; the idea being that the type name
would be first looked up as a built-in/custom type name, and if not
recognised, would fall back to an isa() check. But after some private
discussions, I think its best to keep the two concepts (and name
spaces) entirely separate.

Note that there are some specific advantages of having the type as a
postfix trait, i.e. ($x is Foo, ...) rather than (Foo $x, ...): it makes
it consistent with the other constraint features (where/as/isa), and keeps
everything being processed in a strict left to right order; for example in

    sub f ($x :shared is Int where $x > 10)

all the constraints are processed *after* the 'shared' attribute code
is called; in

    sub f (Int $x :shared where $x > 10)

the order is all mixed up.

The built-in constraint types will also coerce the resultant parameter
to be a suitable type of SV. So for example,

    sub f($i is Int, $s is Str) { ...}

would do the rough equivalent of

    sub f { my ($i, $s) = (int($_[0]), "$_[1]"); ... }

So even if called as f("123"), $i won't have an initial string value
(internally it will be an SVt_IV, not an SVt_PVIV).

Similarly, even if the argument is overloaded, the resulting
parameter won't be - but may trigger calling the relevant overload
conversion method (int, "" or whatever) to get the plain value.

This means that (for example), if a Math::BigInt value is passed as
the argument for $i, the resulting parameter will just be a plain int
and any extra data or behaviour will have been lost.

On the occasions where this is unacceptable, the coder can of course
just not declare a constraint in the signature and do any checks
manually in the body of the function.

=back

So that's the basic idea. I think that most of the time end-users will
just use 'is Type', 'is CustomType' or 'isa Some::Class', while pragmata
writers will make more extensive use of 'where' and 'as' to create custom
constraint types such as 'CustomType'. So most code will use e.g. 'is
AlwaysArrayRef' and only behind the scenes is this defined fully as
something like 'where defined($_) as [ ref ? $_ : [ $_ ] ]'.

Hopefully this proposal provides a general-enough framework such that the
implementers of systems like Moose can make use of it to make them run
the same (but faster) on newer releases of perl.

=head2 Some general rules for constraints

Constraints apart from '!' and 'isa' cannot be used on a parameter which
is a direct alias (e.g.  *$x), since this might trigger coercing the
passed argument and thus causing unexpected action at a distance.

The where/as/isa/is keywords will only be recognised as keywords at the
appropriate point(s) where lexing a signature parameter; elsewhere, they
are treated as normal barewords / function names as before.

At the start of constraint processing, $_ is aliased to the lexical
parameter variable, and any modification of $_ will modify the parameter,
with the change being visible to any further constraints.

The behaviour of $_ if it becomes unaliased from the lexical parameter
(e.g. via local *_ = \$x) is undefined for any further constraints in the
current parameter declaration which make explicit or implicit use of $_,
such as for $_->isa(...) and for the variable which the result of 'as' is
assigned to. The variable being used/modified might end up actually being
either $_ or the lexical parameter, and this might vary between perl
releases and levels of optimisation.

The complete collection of where/as/isa/is clauses are collectively
enclosed in their own logical single scope, in order that $_ can be
efficiently localised just once. This means that any lexical variables
declared inside will not be visible outside of those clauses. For example:

    my $foo;
    sub f ($x    is Int where (my $foo=2*$x) < 10,    $y = $foo) { $foo }

is treated kind of like:

    my $foo;
    sub f ($x {  is Int where (my $foo=2*$x) < 10  }, $y = $foo) { $foo }

in that the $y parameter and the body of the sub both see the outer $foo,
not the inner one.

Note that in my proposal, constraints are applied to a parameter's value
*after* binding, regardless of whether that value was from an argument or
from a default expression. This is because in something like:

    ($x,  $y = $x is Int)

you have no say over what $x might contain. This does however mean that
you may get the inefficiency of applying constraints to e.g. constant
default values. It might be possible in this case to run the constraint
checker against the default value once at compile time, then skip the
check at run time if the default value is used.

Constraints can only be supplied to scalar parameters; in particular they
can't be applied to:

* Slurpy parameters like @array and %hash;

* Reference-aliased aggregate parameters like \@array and \%hash (but in
  these cases perl will already croak at runtime if the supplied arg isn't
  an array/hash ref);

* Query parameters apart from scalar, ?$x.

* Placeholder (nameless) parameters. In the very rare cases where you
  actually want to check the passed argument while throwing it away
  anyway, you can always fallback to using a named parameter:

    sub foo ($self, $       is Int where $_ > 0) { ... }   # illegal
    sub foo ($self, $unused is Int where $_ > 0) { ... }   # ok

  Imposing this restriction makes implementing and optimising constraints
  easier.

=head2 Constraint type names

I envisage that type names will be allowed the same set of characters as
normal identifiers such as variables, and that this set is extended as
expected when in the scope of 'use utf8'. But they aren't allowed ':' (and
specifically not '::') to avoid confusion with package/class names, which
are a separate namespace.

I have a further suggestion (which caused at least one porter to privately
recoil in horror).  I think that '+', '-' and '!' characters should also be
allowed as part of the type name (but not as the first character, and
possibly only as a trailing character). So just for example, either
perl itself or a pragma could define these additional types:

    $x is Int--   equivalent to:   $x is Int where $_ <  0
    $x is Int-    equivalent to:   $x is Int where $_ <= 0
    $x is Int+    equivalent to:   $x is Int where $_ >= 0
    $x is Int++   equivalent to:   $x is Int where $_ >  0

    $x is Str+    equivalent to:   $x is Str where length($_) >  0

which are easier to type and read than "PositiveInt", "StrictlyPositiveInt"
etc, say.

Similarly, a trailing '!' as part of the name might imply a stricter
version of a type. For example, "Int!" might croak if passed any value
which can't be losslessly converted to an integer; so 123.4 and "123.4"
would croak, while 123 and "123" would pass. Plain "Int" would allow both
of those, but would croak on "123abc".

I think we should also include a few built-in "symbol" constraint type
names, specifically:

    is \$   # must be a scalar ref
    is \@   # must be an array reference
    is \%   # must be a hash reference
    is \&   # must be a code ref
    is \*   # must be a glob ref

Which are less clunky than 'is ArrayRef' etc.  I think a plain ref is
better specified as 'is Ref' rather than 'is \ ' though.

(Note however that '$aref is \@' will often be easier to write as '\@a';
i.e. get perl itself to deref and alias the array, doing the check for
free. Ditto \%.)

=head2 Details on 'is' built-in constraint types

We need to decide exactly what built-in types perl should support, and
what value(s) those built-ins (e.g. Int, Int!, Num, Str etc) should accept
and what coercions they perform. I think that these details are still up
for discussion and I don't have any strong feelings. For example in the
discussion above about Int and Int!, I'm assuming that perl will convert a
string containing a valid integer value into an integer rather than
croaking. Perhaps people would prefer instead that a string like "123"
should croak if being coerced to an Int. Or perhaps only a lower-case
variant, "int", should croak. Which of these count as Int:

        undef
        1.2
        ""
        "123"
        " 123 "
        "1.2"
        "0 but true"
        "0.0"
        "0abc"
        "0E0"

etc? The one thing I'm mostly certain of is that Int should *not* just be
a check that the argument has the Svf_IOK flag set.

Perhaps lower-case-only names (like int) should be reserved for perl
built-ins?

=head2 Custom constraint types

At compile time it will be possible for pragmata and similar to add
lexically-scoped type hook functions via the hints mechanism. These will
allow constraint type names to be looked up and handled according to the
pragma's wishes.

It is intended that the lexical scope of the hooks allows built-in types
to be overridden, e.g.

    sub f1($i is Int) {} # built-in Int
    {
        use Types::MakeIntMoreStrict;
        sub f2($i is Int) {} # Int as defined by Types::MakeIntMoreStrict
        {
            use Types::EvenStricter;
            sub f3($i is Int) {} # Int as defined by Types::EvenStricter
        }
    }

In the presence of hooks, the hook functions are called at the
subroutine's *compile* time to look up the constraint type name. The
return value of the hook can indicate either:

1) An error string.

2) Unrecognised: pass through to the next hook, or in the absence of
further hooks, treat as a built-in.

3) A returned checker sub ref which will be called at run-time each time
the parameter is processed. The sub ref takes a single argument, which is
the parameter being processed, and the return value(s) can indicate
either:

    * an error string;
    * the parameter is ok;
    * or a return value  which should be used in place of the
      parameter (this allows coercion).

Note that the hook sub ref itself can have a signature with constraints.
So the extra constraint processing done by the sub ref can be handled
either as explicit code in its body, or implicitly with its own signature
constraints.

Also, the sub ref can (if it chooses) modify its $_[0], which means it's
modifying $_, which is aliased to the parameter of the caller currently
being processed and possibly aliased to the checker sub's caller's
caller's argument too.  In fact arguably it should achieve coercion by
modifying $_[0] rather than returning a new value as was suggested above.

The sub should only croak on some sort of internal error; when detecting a
constraint violation, it should just return an error string; this allows
for the possibility of alternations (although I'm not keen on allowing
alternations).

4) Return a string containing a source code snippet to be inserted into
the source text at that point.

This option is in many ways the most interesting, as it effectively allows
pragmata to inject extra constraints into the source code. For example,
suppose there's a user-written pragma called Type::IntRanges; then with
this code:

    use Type::IntRanges;
    sub f ($x is PositiveInt) { ...}

At 'use' compile time the pragma registers itself in the lexically scoped
hints. Then when the signature is parsed and compiled, the pragma's hook
function is looked up in the hints, then called with the type name
'PositiveInt'; the hook returns the string

    'is Int where $_ >= 0'

which is injected into the source code stream as if the coder had instead
directly written:

    sub f ($x is Int where $_ >= 0) { ...}

Similarly, ($x is AlwaysRef) might be translated at compile time into
        ($x where defined($_) as ref ? $_ : [$_] )

(i.e. coerce into a ref, but croak if not defined).

This can be nested; the injected source code can also contain a custom
type name which will also trigger a source code injection.

[ Note: I have no idea how easy it will be to inject raw src text into the
input stream, especially if the lexer has already processed the token
following the type name and passed it to the parser as the lookahead
token. If not viable, then I may have to drop this option. ]

While powerful, this code injection has a couple of downsides.  First, you
may get compiler warnings or errors appearing to come from a place in your
source code where there is no such syntax. To avoid this, hook writers
should be encouraged to write hooks which only supply simple, well tested
code snippets which shouldn't produce warnings or compile errors (they can
of course cause constraint errors). Secondly, there's nothing to stop a
hook returning 'bobby tables'-like source code like 'is Int,
$extra_param'. The docs should state that doing anything other than
injecting extra constraints into the current parameter is undefined
behaviour.

Conversely, using a sub ref to process every parameter avoids the
confusion of code injection, but is slow: a sub call for every parameter
in the current sub call. Also, error messages may appear to come from the
hook sub ref buried somewhere in a pragma.pm module, rather than the
user's code.

5) This is a tentative suggestion that would replace options 3) and 4).
This would be for a constraint hook to be specified as a empty-bodied sub
with a single parameter. The constraint(s) specified for that parameter
become the custom constraints which that hook provides. In some fashion
the code previously compiled for that "prototype" sub's constraint is
copied and/or executed. This would be more efficient than calling a whole
sub for each parameter, and would more constrained than injecting text
into the source code. It would of course be nestable; for example:

    hook 1: 'PositiveInt' maps to: sub ($x is Int where $_ >= 0) {}
    hook 2: 'OddPosInt'   maps to: sub ($x is PositiveInt where $_ % 2) {}

    sub foo($self, $arg is OddPosInt) { ... }

Most of these hooking methods may have issues with deparsing correctly, so
this needs careful implementation.

=head2 Checking types outside of signatures.

I propose that for each built-in constraint type there will be a
corresponding function in the 'is::' namespace which returns a boolean
indicating whether the argument passes that constraint. This would be
particularly useful where the constraint is too complex to be specified in
the signature, e.g.

    sub f ($n) { die unless is::Int($n) || is::Num($n); ... }

Note that these functions would only do the checking part of the type's
action, not the coercion part (if any).

The is:: namespace would behave similarly to utf8::, in that the functions
are always present without requiring 'use is'.

I'm not sure whether a similar facility can be provided for custom types.
Perhaps have an is::is($x, 'Type') function which at runtime looks up
"Type" using the same lexical hints, to find the right hook. This would
require custom hooks to provide info for both the signature compilation
and a function to be called at runtime. This is a bit hand-wavey. It would
be also be useful for built-ins having extra characters in them like I
suggested above, e.g. is::is($x, 'Int++') and is::is($aref, '\@');

=head2 Moosey extensions to 'is'

Moose supports aggregate and alternation / composite constraints; for
example, ArrayRef[Int] and [Int|Num].

Personally I think that we shouldn't support these; it will make things
far too complex. Also, the nested HashRef[ArrayRef[Int]] form quickly
becomes a performance nightmare, with every element of the AoH having to
be checked for Int-ness on every call to the function.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About