Front page | perl.perl5.porters |
Postings from August 2023
Re: PPC Elevator Pitch for Perl::Types
Thread Previous
|
Thread Next
From:
Paul "LeoNerd" Evans
Date:
August 19, 2023 11:34
Subject:
Re: PPC Elevator Pitch for Perl::Types
Message ID:
20230819123352.3a030233@shy.leonerd.org.uk
On Wed, 16 Aug 2023 04:37:07 +0000
Oodler 577 via perl5-porters <perl5-porters@perl.org> wrote:
> # Proposed Perl Changes Elevator Pitch for Perl::Types
...
The following is my personal response, though it is shaped somewhat
along the direction that seems to agree with the PSC's feelings overall.
My short reaction to this: No.
My medium-sized reaction to this: This doesn't sound like the sort of
thing that is either a) a natural extension to the way that people
currently write Perl code, nor b) an end-goal direction that I would be
happy to see the language go in.
------
What now follows is a much longer essay explaining my reasoning for
saying the-above. It's not necessary to read this simply to get my
answer.
## A Natural Extension
When trying to design new Perl features, I am often inspired by two
other languages whose history I have followed - C and Scheme. There's
some interesting parallels. I'm not referring to technical features of
those languages, but of some interesting points of history. Both
languages are defined by a formal specification, with multiple
different implementations and users all trying to share it. ANSI C and
then C89 were both specifications that wrote down and formalized what
had been existing common practice around compilers and implementations.
The same was true for Scheme specs up to R5RS. After that, both C99 and
R6RS did something weird. They both (independently) went off in a
direction where the language spec itself tried to invent new concepts,
before they were commonly implemented or commonly used by users of
those languages. I think it's fair to say that neither spec was
well-received by either its implementors or its users. Sure both had
some good points, but in both cases the spec was found to have
over-reached, promising things that implementors didn't want to create
and users couldn't imagine they'd want. In time both were quietly
retconned. C11 mostly builds atop C89, offering as "optional" a bunch of
things that C99 offered but nobody liked. Scheme R7RS has now been
split into a "small" and a "large" model that basically builds on top
of R5RS and pretends R6 never happened.
(There's also some interesting parallel here with Perl: going up to 5,
trying to invent new things at 6, then splitting off to its own new
name while 5 continues in the hope of an eventual 7. But I digress ;) )
The point I'm trying to make here is that successful languages almost
always come out of observing what people are *actually* doing or
wanting to do but can't for lack of features, and formalizing those
behaviours into convenient concepts within the language so everyone can
operate on common terms.
Lets take a comparison here to Perl's subroutine signatures. At first,
Perl didn't have such a feature natively, but gave users the building
blocks (via the @_ array) to construct something of that nature
themselves. So eventually lots of people were writing code looking
something like
sub f { my ($x, $y, @z) = @_; ... }
When signatures were added, they didn't really add fundamentally new
abilities; they just tidied up what had been existing practice up til
that point. They allowed people to write code in a shorter, neater,
less noisy fashion, that still behaved *as if* it was written in the
longer form.
sub f ($x, $y = 123, @z) { ... }
sub f {
@_ >= 1 or die "Too few arguments";
my $x = $_[0];
my $y = @_ > 1 ? $_[1] : 123;
my @z = @_[2..$#_];
...
}
Lately we've added the //= and ||= operators to signature syntax as
well to permit people to write in signatures what they'd often write
before as stuff like `my $y = $_[1] // 123;`.
Signatures were added in order to provide a short neat concise way for
the programmer to give a standard notation for various behaviours
around creating lexicals and inspecting @_ that *they were already
doing*. Signatures have become popular and well-used precisely because
they are just a neater way to rewrite existing code.
What does this mean for a "type system"? Again, it would do well to
check what existing behaviours people are actually writing now. Almost
universally, the kinds of code people are trying to clean up and
replace are variations on a theme of "Do I like this value?" when being
passed into subroutines or methods, stored in object fields, stored in
regular lexicals, that kind of thing. In effect, where subroutine
signatures tidied up code of the form such as `my $x = shift @_`, so
too should a value constraint system tidy up code of the form
`ACCEPTABLE($_[0]) or die "Cannot accept this argument"`.
Keep that in mind in the following.
## End-Goal Direction
This brings me on to my second point. Back in 2020, I wrote a blog post
that I turned into a presentation at a conference, "Perl in 2025",
where I imagined what kind of Perl code I might be writing five years
hence. Many of the points explained in that talk have set the direction
for things I have been actively working on since then. One I've not
really looked at until you bring it up here, is that "type system".
In my talk (https://www.youtube.com/watch?v=raSRNIoUYig), somewhere
around the 25m mark, I start classifying what various folks mean out of
"type systems". There's three main categories:
1. Static code checks
I.e. Can the compiler (perl -c) tell me "this program is bad"?
2. Dynamic value assertions
I.e. Does the program abort if I give it the wrong data?
3. Compiler hints
I.e. can the runtime run faster because it knows certain
conditions will be impossible?
These categories aren't mutually exclusive of course. For example, C's
type system is very much a mix of categories 1 and 3. TypeScript adds
a category 1 system to JavaScript, and Python also has some optional
typing stuff that again mostly fits category 1.
Perl is a very dynamic language - most of the things that you can do
would violate any sense of "static safety" that category 1 would
attempt to give. Perl isn't built for "performance above all else"
anyway, and category 3 would cut off many situations where we actually
like that flexibility - e.g. being able to use overloaded objects like
bigint, bigrat, String::Tagged, etc... in place of native values. Sure,
I agree that having to check for overloading makes code like `$x + $y`
run a bit slower than if you didn't have to check that, but being able
to transparently pass bigint or bigrat values into existing code and
operate on values larger than native platform IV or NV without losing
any precision at all is a great feature to have.
For these reasons, I feel that really only category 2 is the sort of
thing that is something we can, or should, add to Perl. This is why I
bring attention to the idea that we want to automate away the kinds of
"die if I don't like this value" code that people are already writing.
That's basically what category 2 already is, just manually written out
by hand. Whatever Perl might add in future should be an automation and
standardization of that existing style of code.
Furthermore, because I feel sufficiently strongly that this style of
dynamic value constraints at runtime is the best approach to be taking,
I have tried hard to avoid the word "type" when writing about or
describing it. I've said repeatedly and I'll say again: the whole
thing is verymuch "do we like the look of this value?". The key words
here are "look" and "value". It's about dynamic values that the program
actually operates on. It's about what they look like, not what their
underlying in-memory bit-pattern representation actually is.
# A Critique of Perl::Types
With these points in mind, we are now more prepared to take a closer
look at what your Perl::Types proposal suggests. Overall it's hard to
see what real behaviour is being added by the proposal as the details
in the email are very short. The CPAN dist has almost no code, no
tests, no documentation, so it's very hard for me to guess. I therefore
can't say with 100% certainty that what you propose doesn't fit what
I'm about to explain, but from an initial read of it I find it doubtful.
Perhaps, as a starting point, someone could outline what *kind* of type
system you imagine here, with particular reference to the three
categories I outline above. To what extent does what you propose fit
into each of categories 1, 2 or 3?
## An over-sensitivity to internal data representation
Your email talks about the difference between IVs, NVs and PVs and
honestly that is a tiny internal detail of representation waaay down in
the weeds. Perl - as a language and a culture - has never really cared
about whether a number is really a number, or a string containing some
digit characters to represent it in decimal form. 10 and "10" are mostly
the same thing. Sure, the bits stored in memory are different, but what
you can do with them is the same.
Many people use this to great advantage. For example, if you write a
commandline script that takes some sort of "count" argument, you're
probably going to parse it out of the text contained in the @ARGV
array. By nature that will have arrived as text, but that's OK - it's
Perl. As long as the value looks like a number, we can use it as a
number. It would be most un-Perlish to say that text in @ARGV has to be
converted into a pure-number form just to accept commandline
processing. That kind of conversion would be more at home in languages
such as Python, which make much stronger distinctions about numbers vs.
strings.
I don't see anywhere in your proposal where you account for this; a way
to specify "This value should be usable as if it was a number". Don't
forget things like bigint and bigrat exist and can be used as numbers.
Also when thinking about strings, objects can have overloading on them
to behave like strings ((not quite as well as we'd like currently in
Perl, but that's the subject of PPC0013)).
What I would like to see is something like this; where we imagine (for
sake of current argument) that "Numerical" is something that exists -
its actual nature still to be determined.
my Numerical $n;
$n = 10; # permitted
$n = "10"; # also permitted
$n = bigint->new("10"); # also permitted
$n = "hello"; # not permitted; throws some kind of
# exception at runtime
and of course we can apply those to fields, subroutine parameters,
etc.. in the hopefully-obvious way:
class Point {
field Numerical $x;
method move(Numerical $dX) { ... }
}
sub add(Numerical $x, Numerical $y) { ... }
In all of these cases, the ability to put a value constraint on a field
or parameter is really just a shortcut to writing some "check or die"
code manually. As I tried to hammer in above - this is just an
automation of the way that people can currently write these.
class Point { field Numerical $x; ... }
class Point { field $x; ADJUST { is_Numerical($x) or die "Bad x" }
... }
sub add(Numerical $x) { ... }
sub add($x) { is_Numerical($x) or die "Bad x"; ... }
Admittedly the one on plain lexicals isn't easy to write currently, but
it's possible via a little bit of Magic. It's something I may have a
hack at at some point soon.
## An under-sensitivity to particular values
The other thing that I feel is hugely missing from Perl::Types is
further constraints of values. Whereas I basically never care if a
numerical value is an IV, NV, PV, or some overloaded object or
whatever, what I do often care about is whether it is within some
numerical range. Perhaps it has to be positive. Or positive-or-zero. Or
maybe between two bounds - perhaps a percentage only goes up to 100, or
a mix ratio only goes up to 1.0.
And of course further constraints apply a lot more than just numbers.
Maybe for a string I'd constrain a certain length, or to match a
certain regexp pattern.. or object references have to be within a
certain subclass, or have some matching value for one of their fields,
or... whatever. Whatever Perl code I could have written in my
"IS_ACCEPTABLE($param) or die ..." expression, I should be able to
express in a value constraint. I don't currently see anything like
that in your proposal. That strikes me as a big missing feature.
# What Would I Suggest Instead?
I don't feel it very useful for me to write half a book's-worth of text
as a reply that basically says "no don't do that", without suggesting
something else more preferable instead. So I'm going to end this reply
with a quick look at something I'd much rather be considering.
The module Types::Standard seems to set a standard interface for
performing "constraint checks", that fulfils much of the behaviour I
describe above. It's well-known, well-used (via the Type::Tiny module),
and used by lots of people in lots of places.
As a general shape of model (i.e. true/false assertions on "do we like
this look of this value?") it seems far more aligned with the general
shape I outline above. I've even got as far as writing a little
Object::Pad field attribute module for using these check objects on
field assignments
https://metacpan.org/pod/Object::Pad::FieldAttr::Checked
class Point {
field $x :Checked(Num);
...
}
At some point I may get around to writing a similar attribute for
regular lexical variables. I was going to allow it to apply equally to
subroutine signatures, only I discovered Perl doesn't support those
yet. Hrm. Perhaps something for us to add in 5.39...
my $x :Checked(Num);
# not yet supported
sub add($x :Checked(Num)) { ... }
It's surely not perfect. For one, the syntax needs improving - these
kinds of constraints should be upfront syntax on the fields themselves:
class Point {
field Num $x;
}
my Num $x;
sub add(Num $x) { ... }
Performance-wise it could also be made a lot better. Currently every
checker is stored as a CV pointer, so performing a constraint check
involves an entire call_sv() operation. Internally that'd need a lot of
fixing before we'd consider it for core.
There's also the question of what really is the nature of something
like "Num" in the examples above. As the code currently stands, those
are regular Perl expressions that yield checker object instances at
compiletime, to be used to invoke a `->check` method at runtime. If we
had a truely native constraint-check syntax we'd most likely want to
define what a "check" really is as something more fundamental and new
than just that. It would likely want to operate purely in the C level
in most common cases, avoiding entirely any need for evaluating Perl
expressions just to check the common standard constraints.
But that's relatively small detail on the grand scheme of things.
Overall it's going in about the right direction, and any improvements
to it wouldn't fundamentally alter this shape; just make it better at
doing what it currently does.
What it does is very extensible to more parts of Perl, of course. Once
you have constraint checks that give you a "yes/no" answer and can use
them for accepting or rejecting values at assignment time into lexicals
or fields, or invocation time on function calls (see above examples),
you start to find you can go further than that.
A set of yes/no checks allows you to define multiple dispatch on
subroutines, for example:
check Triangluar :is(Shape) where { $_->points == 3 };
check Rectangular :is(Shape) where { $_->points == 4 };
check Pentagonal :is(Shape) where { $_->points == 5 };
multi sub draw(Triangular $shape) { ... }
multi sub draw(Rectangular $shape) { ... }
multi sub draw(Pentagonal $shape) { ... }
Or perhaps to take part in match/case syntax
match($shape : is) {
case(Triangular) { say "My shape has 3 sides" }
case(Rectangular) { say "My shape has 4 sides" }
...
}
In other words: Once you come up with a good shape for this kind of
thing, you find it's applicable in a lot more places than just
parameters to functions, or field assignments in objects. It's this
sort of force-multiplier that makes good components of languages - one
idea that can be reüsed in lots of different situations.
If people are going to spend their time making "some kind of type
system", the sort of structure I explained here is much more the kind
of thing I'd like to see people making. It has great potential to reüse
existing components, it easily fits into and improves the kind of code
people are already writing and have written for years. Straight away
people will be able to see where - and why - to use this new system.
Great languages are made like Lego kits: a collection of individual
pieces that are each simple to describe, but can be combined in lots of
flexible combinations. We should aim to make good bricks.
--
Paul "LeoNerd" Evans
leonerd@leonerd.org.uk | https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/ | https://www.tindie.com/stores/leonerd/
Thread Previous
|
Thread Next