develooper Front page | perl.perl5.porters | Postings from December 2019

Re: Type and Value Constraints and Coercions

From:
Zefram via perl5-porters
Date:
December 2, 2019 15:50
Subject:
Re: Type and Value Constraints and Coercions
Message ID:
20191202155020.hrz7mdcgcjnxrnvl@fysh.org
Dave Mitchell wrote:
>            $self isa Foo::Bar,         # croak unless $self->isa('Foo::Bar');
>            $foo  isa Foo::Bar?,        # croak unless undef or of that class
>            $a!,                        # croak unless $a is defined
>            $b    is  Int,              # croak if $b not int-like
>            $c    is  Int?,             # croak unless undefined or int-like
>            $d    is PositiveInt,       # user-defined type
>            $e    is Int where $_ >= 1, # multiple constraints
>            $f    is \@,                # croak unless  array ref
>            $aref as ref ? $_ : [ $_ ]  # coercions: maybe modify the param

Yuck.  This is a huge amount of new syntax to add.  The new syntax doesn't
pull its weight, given that it can only be used in this one context.
If you're adding a bunch of syntax for type constraints, it should also
be available for type checking purposes outside signatures.

It's also rather too Perl6ish for Perl 5: all these consecutive barewords
will cause a bunch of new parsing ambiguities.  Thinking about how the
constraint syntax would be made available in general expression contexts
might help in coming up with less troublesome syntax.

>So, given that a constraint type system and a "real" type system are two
>separate things (unless someone smarter than me can can suggest a way of
>unifying them), I think that they should be kept syntactically separate.

Yes, this is a good decision.

>processed against the lexical parameter, after any binding of arguments or
>default value.

In previous discussion, we were leaning towards exempting default values
from constraints.  Given that this is about constrainting arguments,
rather than applying types to lexical variables, I still think exempting
defaults is advantageous.

As for aliasing, it seems to me that in a signature (\@foo), \@foo is
a scalar value capable of being constrained.  It makes perfect sense to
apply a constraint to an argument that is received by aliasing, and no,
the automatic constraint to it being an array reference isn't enough.

>    sub f ($x isa Class::name)

Would "($x isa $other_class)" be legal?  The stuff about postfix
"?" suggests that this syntax is too specific to permit the use of an
arbitrary expression.  But forbidding general expressions would be an
annoying limitation on the use of "isa".

>    sub foo ($x is Int ) { ... }

Although you say you're not creating a core type system, you are somewhat
doing exactly that here.  You're certainly inventing a namespace populated
with a bunch of type-like objects.  This is not to be done lightly,
and deciding what "Int" means is a substantial task.  You're very much
importing Perl 6 syntax that's tied to semantics that Perl 5 doesn't have.
Perl 6 already has a well defined thing called "Int", which knows which
values satisfy it and which don't, whereas Perl 5 has a semantic that
*anything* is an integer if you want to treat it that way.  We certainly
can come up with concepts of "integer" for Perl 5 that identify a proper
subset of values, but there are many possible concepts, and there's no
precedent for the Perl 5 core being concerned with any of them.

>    sub foo ($x is PositiveInt) { ... }
>    # roughly equivalent to: ($x is Int where $x >= 0)

I hope you don't think that zero is a positive integer.

>Like 'isa', 'is' type names can be followed by '?', indicating that an
>undefined value is also allowed.

This again implies that general expressions won't be permitted on the
rhs of "is".  It's a bigger problem for "is" than for "isa".

>Type names as used by 'is' occupy a different namespace than perl
>packages and classes,

It's quite necessary to make this distinction, and particularly to
distinguish between "isa" and "is".  But I have issues with the new
namespace used by "is"; see below.

>The built-in constraint types will also coerce the resultant parameter

It's a bad idea to mix these separate concerns.  Constraint checking and
type coercion are different ideas that should remain distinct.  Also,
just as there are multiple ideas of what is an integer in Perl 5, there
are multiple ideas of what turning a value into a `purer' integer entails.
Remember, passing "is Int" implies that the supplied argument *is* an
integer (whatever that means), so coercing it *to* an integer should
be the identity operation.  If you're doing a non-identity coercion,
that means you've got a second, stricter, concept of integer in play.
Wanting to check that an argument satisfies one concept of integer
does not imply which stricter concept of integer you'd like it to be
converted to.

>Constraints apart from '!' and 'isa' cannot be used on a parameter which
>is a direct alias (e.g.  *$x), since this might trigger coercing the
>passed argument and thus causing unexpected action at a distance.

It seems essential, to me, that constraints should be applicable to
aliased parameters.  Constraint checking code should not have such bad
taste as to side-effect its parameter.  Coercion would also better be
seen as a function applied to the parameter to return a coerced value,
rather than mutating its input.  But if code is written such that it
does behave so badly, well, it's not the first nor even the fifth place
in Perl 5 that side effects can surprise distant code.

>The complete collection of where/as/isa/is clauses are collectively
>enclosed in their own logical single scope,

This sits uneasily with the interleaving of these clauses with default
value expressions.  I'm not sure what the scope of lexical variables
introduced in a where clause *should* be, but I'm pretty sure it shouldn't
be visible in a later where clause without also being visible in an
intervening default value expression.  I think it's also difficult
to implement such selective visibility, given the way lexical scopes
are managed.

I'm concerned about the idea of this scope, whether in its lexical or
dynamic aspects, being at all visible to the programmer.  It has the
whiff of implementation leaking out.

>Constraints can only be supplied to scalar parameters; in particular they
>can't be applied to:
...
>* Placeholder (nameless) parameters.

Bad idea.  It should be possible to type check an argument that is
otherwise ignored.  If it's just a matter of the implementation wanting
a lexical variable to apply the constraint logic to, you can perfectly
well create a lexical variable (pad slot) without any name.

>    $x is Int+    equivalent to:   $x is Int where $_ >= 0
>    $x is Str+    equivalent to:   $x is Str where length($_) >  0

Failure to make the "+" parts analogous.  This suggests that this kind of
name (for which there's no precedent in Perl) would be fairly confusing.

>I think we should also include a few built-in "symbol" constraint type
>names,

Doesn't seem worth the irregularity.

>At compile time it will be possible for pragmata and similar to add
>lexically-scoped type hook functions via the hints mechanism.

It seems to me that Perl already has serviceable namespacing mechanisms,
and doesn't need a new kind of namespace just for type constraints.
It would be better for the rhs of "is" to take an arbitrary expression,
and use the value to which that expression evaluates as the type
constraint object.  This way we get to use all our existing mechanisms
to manage the names of type constraints.  One "use" declaration and
the programmer can have "Int" et al defined the way you imagine.
This would also avoid the core taking some arbitrary position on what
"Int" `really' means.

The ability to construct type constraints in a general expression can
easily subsume "isa" and "where".  There's no need for so much syntax.

We also already have a serviceable mechanism for type constraint objects:
objects that overload the smartmatch operator.  No need to reinvent
the wheel.

>4) Return a string containing a source code snippet to be inserted into
>the source text at that point.

Yuck.  Terrible plugin mechanism; very vulnerable to lexical state
affecting the parsing.  Don't bring Devel::Declare crack into the core,
and don't encourage people to write fragile plugin code.  It's already
possible for a constraint checking sub to inline itself via call checker
magic.

>This would be for a constraint hook to be specified as a empty-bodied sub
>with a single parameter. The constraint(s) specified for that parameter
>become the custom constraints which that hook provides.

Nasty.

>I propose that for each built-in constraint type there will be a
>corresponding function in the 'is::' namespace which returns a boolean
>indicating whether the argument passes that constraint.

Too limited.  If there's special syntax on the rhs of "is", then the
whole thing, including "where" clauses, "?" decorations, and references
to user-defined type constraints, should be available in some kind of
expression context.  Essentially, "$x is Int where $_ > 3" should be a
truth-value expression.  Of course, this runs into the problem of the
bareword-based syntax not playing nicely with existing expression syntax;
the syntax would have to be redesigned to fix that.

>be also be useful for built-ins having extra characters in them like I
>suggested above, e.g. is::is($x, 'Int++') and is::is($aref, '\@');

Wrong way to do it.  It would mean essentially implementing the type
constraint syntax twice: once in the actual parser, for signatures,
and a second time to handle the string argument to is::is().

>Moose supports aggregate and alternation / composite constraints; for
>example, ArrayRef[Int] and [Int|Num].
>
>Personally I think that we shouldn't support these; it will make things
>far too complex.

Semantically, things like ArrayRef[Int] and junctions are quite
frequently needed.  It should be easy to construct such type constraints.
Predeclaring and giving them monomial names as user-defined type
constraints seems rather cumbersome.  This is part of why I favour the
rhs of "is" being a general expression context.

>                 Also, the nested HashRef[ArrayRef[Int]] form quickly
>becomes a performance nightmare, with every element of the AoH having to
>be checked for Int-ness on every call to the function.

If that's the type checking that's actually required, then the cost
of checking must be borne.  It is a false economy to discourage the
programmer from making the proper checks.

-zefram



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About