develooper Front page | perl.perl5.porters | Postings from February 2022

Thoughts Towards Type Assertions (was: Re: Pre-RFC: builtin::functions for detecting numbers vs strings)

Thread Previous | Thread Next
From:
Paul "LeoNerd" Evans
Date:
February 24, 2022 16:31
Subject:
Thoughts Towards Type Assertions (was: Re: Pre-RFC: builtin::functions for detecting numbers vs strings)
Message ID:
20220224163102.770b6526@shy.leonerd.org.uk
On Thu, 24 Feb 2022 14:04:04 +0000
"Paul \"LeoNerd\" Evans" <leonerd@leonerd.org.uk> wrote:

> If it was limited to just some predicate test functions living in
> `builtin` this might not be too hard a problem, but there's a whole
> new layer lurking below, in the form of type assertions on
> lexicals, subroutine signature arguments, object fields; as well as
> other syntax like match/case, `multi sub`, typed catch, ... This is a
> far-reaching question with lots more complication to it than initially
> meets the eye...
> 
> ((I'll probably follow up that thought in a separate email))

For the remainder of this discussion, I'm going to have to begin with a
hypothetical. Lets imagine two facts that aren't quite true (yet):

  * Perl 5.10 never added the ~~ operator, and smartmatch was never a
    thing. The ~~ syntax remains available free to use. This is purely
    because we don't have much syntax space available for new
    operators, and I need one below. ;)

  * We've finished the PV vs. IVNV split that PR #18958 began, so now
    we can perfectly remember "was originally a string" vs "was
    originally a number".

Lets now imagine that we're about to define a new ~~ operator, and a
whole new type of value (called a Type). The ~~ operator checks if the
value on the LHS matches some Type object on the RHS; i.e.

  say "Var is a thingy" if $var ~~ $ThingyType;

Lets further imagine we have a few named type-checking objects. We can
probably imagine quite a set of these, but we'll start with

  $var ~~ Defined   -- equivalent to  defined($var)
  $var ~~ Undefined -- equivalent to  not defined($var)
  $var ~~ Boolean   -- equivalent to  builtin::isbool($var)
  $var ~~ Object    -- equivalent to  defined builtin::blessed($var)

We can see these don't have to be distinct - they can overlap. Any
Boolean or Object value is already going to be Defined. But that's fine.

Perhaps (and I'll get back to why this is tricky) we could also imagine
things like Number and String:

  $var ~~ Number
  $var ~~ String

These type checking objects are still just firstclass values in the
language. Nothing stops us assigning one into a variable and using that
later:

  my $numtype = Number;
  if( $var ~~ $numtype ) { ... }

From this starting point, we can already imagine how match/case syntax
would easily interact with these:

  match($var : ~~) {
    case(Undefined) { say "Var is undef" }
    case(Boolean)   { say "Var is a boolean" }
    case(Number)    { say "Var is a plain number" }
    ...
  }

We could imagine this interacting with `multi sub`. Lets now say that
in front of a `multi sub` parameter argument, we can optionally put a
constructor expression for one of these type objects (which would be
constructed once and stored statically somewhere). The dispatch logic
would then apply the ~~ test across each of the alternatives and taking
the first one to match, much as we could with match/case:

  multi sub jump()           { say "Jumping with nothing" }
  multi sub jump(Boolean $x) { say "Jumping with boolean $x" }
  multi sub jump(String $x)  { say "Jumping with stringy $x" }

and thus all of jump(), jump(true), jump("true") would behave
differently.

For that matter, we could imagine these applying in a type-assertion
manner to lexicals, regular (non-multi) sub params or object fields:

  my String $message;

  sub resize(Number $width) { ... }

  object Person {
    has Number $age :param;
  }

where suddenly we're going to apply the ~~ test implicitly as part of
assigning the value. All of these would then fail:

  $message = undef;

  resize("hello");

  Person->new(age => "five");


Now suddenly we hit upon our first problem. What happens here?

  resize("5");

I am going to imagine that about half of the people reading this mail
will confidently say this is totally fine, and the other half will
confidently say this is invalid and must not be allowed.

If you think that should be disallowed because "obviously nobody would
write that", well then what about this?

  resize($ARGV[0]);

Elements of @ARGV are "obviously" strings because they come from the
commandline. I don't think users would be very happy if we said from
now on they have to coërce these by doing  resize(0+$ARGV[0])  if they
wanted to treat them as numbers instead. This cuts to the very heart of
Perl's string/number duality.

Instead, I think the very premise of us having these String and Number
type objects in the first place was wrong. I think instead it would be
better if we thought in terms of "can be treated as a string" or "can
be treated as a number", and name them in a more adjectivey way:

  $var ~~ Numerical
     true if defined, non-referential, and `0+$var` would not warn, or
          if an object that has overload '0+'
     false otherwise

  $var ~~ Textual
     true if defined and non-referential, or
          if an object that has overload '""'
     false otherwise

Note also a few cornercases of these rules: both 2 and "2" satisfy both
Numerical and Textual, as do both true and false booleans.

((actually, I think "Textual" is a bad name for this one, because it
  suggests "Unicode text". This may not be the case for pure-bytes like
  we'd get from a sysread() et.al. I didn't want to open /that/
  particular can of worms today though))

I think in practice, outside of those weird cornercases of JSON
encoders etc, these are the sorts of type assertions that most users
care about most of the time. E.g. any value that passes a Numerical test
is meaningful and well-behaved in number-like expressions, e.g.

  say "Twice my number is ", $var * 2;

It doesn't matter if the value is a real SVt_IV or SVt_NV, or some
fancy object like Math::BigRat. It could even be a pure string that
happens to contain only ASCII digits such as "123". All that matters is
"Perl can treat it like a number".

Which all begins to suggest that the primary thing that we want to
expose to users and encourage them to use should be based on questions
around "what can I do with this value?" rather than "what is this value
fundamentally made from on the inside?".

At this point I suddenly don't even like the word "Type". But currently
I don't have a better one - words like "nature", "facet", "ability" or
"capability" all feel wrong somehow.


Coming back to the case of encoders for JSON et.al., I think those may
actually have a justifiable use-case for wanting a "this was originally
a number" vs "this was originally a string" test function. I think in
that case we can get away with some long awkward name that hints to the
user "!! Think Carefully Before Using !!" on the label:

  builtin::was_originally_number $x
  builtin::was_originally_string $x

These would only be true on defined non-references. They'd never be
true on refs - blessed or otherwise - even if those objects did have
overload magic on them. It's up to the encoder to decide how it might
handle objects or other refs, but it already has tools available to
check all of those.

This would be fine in a serialisation/language interop module, but jump
out at you as "You shouldn't really be worrying about this" when
encountered elsewhere such as in more general function argument sanity
checking.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About