develooper Front page | perl.perl5.porters | Postings from September 2023

Re: PPC Elevator Pitch for Perl::Types

From:
Oodler 577 via perl5-porters
Date:
September 2, 2023 04:11
Subject:
Re: PPC Elevator Pitch for Perl::Types
Message ID:
ZPK11uGtE6qRBCTX@odin.sdf-eu.org
> Re: PPC Elevator Pitch for Perl::Types
> Thread Previous | Thread Next
> From: Dave Mitchell
> Date: August 22, 2023 10:03
> Subject: Re: PPC Elevator Pitch for Perl::Types
> Message ID: ZOSHzZ2SS1lMJdZy@iabyn.com

> On Tue, Aug 22, 2023 at 04:19:04AM +0000, Oodler 577 via perl5-porters wrote:
> > Yes you are correct, the current mechanism for enabling type-checking
> > for subroutine calls is essentially a type of source filter.  As
> > mentioned in our first reply:
> > 
> > > > The Perl compiler currently supports type enforcement for subroutine calls, so that is our starting point for Perl::Types.
> > 
> > This source-filter-like functionality is currently contained in
> > the Perl compiler's file `Class.pm`

> So just for the avoidance of doubt, is the proposal that Perl::Types,
> when running in a perl (rather than RPerl) environment, will use perl's
> actual source filter mechanism, or that it will use some other
> "essentially a type of source filter" thing?

> If the latter, please expand.

It is not an actual source filter as documented in `perlfilter`,
although it is the same general concept.  As mentioned in the
previous response, _"this source-filter-like functionality is
currently contained in the Perl compiler's file `Class.pm` and is
triggered by including `use RPerl;` in a Perl source code file."_
You can see the filter-like implementation in the link and code
snippet from our last response:

https://metacpan.org/release/WBRASWELL/RPerl-7.000000/source/lib/RPerl/CompileUnit/Module/Class.pm#L673-715

```perl
if ( $CHECK eq 'ON' ) {
    my $i = 0;                    # integer
    foreach my $subroutine_argument ( @{$subroutine_arguments} ) {
        # only enable type-checking for arguments of supported type;
        # NEED UPGRADE: enable checking of user-defined Class types & all other remaining RPerl types
        if (exists $TYPES_SUPPORTED->{$subroutine_argument->[0]}) {
            $subroutine_arguments_check_code .= q{    rperltypes::} . $subroutine_argument->[0] . '_CHECK( $_[' . $i . '] );' . "\n";  # does work, hard-code all automatically-generated type-checking code to 'rperltypes::' namespace
        }
        $i++;
    }
 
    activate_subroutine_args_checking( $package_name, $subroutine_name, $subroutine_type, $subroutine_arguments_check_code, $module_filename_long );
    $inside_subroutine         = 0;
    $subroutine_arguments_line = q{};
}
```

As mentioned, _"you can see that both `ON` and `TRACE` call
`activate_subroutine_args_checking()`, which is where the type-checking
calls are inserted into a new subroutine, that wraps around the
original un-type-checked subroutine"_:

https://metacpan.org/release/WBRASWELL/RPerl-7.000000/source/lib/RPerl/CompileUnit/Module/Class.pm#L1016-1183

```perl
    # re-define subroutine call to include type checking code; new header style
    do
    {
        no strict;

        # create unchecked symbol table entry for original subroutine
        *{ $package_name . '::__UNCHECKED_' . $subroutine_name } = \&{ $package_name . '::' . $subroutine_name };  # short form, symbol table direct, not strict

        # delete original symtab entry, 
        undef *{ $package_name . '::' . $subroutine_name };

        # re-create new symtab entry pointing to checking code plus unchecked symtab entry
        $subroutine_definition_code .=
            '*' . $package_name . '::' . $subroutine_name . ' = sub { ' .
            $subroutine_definition_diag_code .
            ($subroutine_arguments_check_code or "\n") .
            '    return ' . $package_name . '::__UNCHECKED_' . $subroutine_name . '(@ARG);' . "\n" . '};';

        # create new checked symtab entries, for use by Exporter
        $check_code_subroutine_name = $package_name . '::__CHECK_CODE_' . $subroutine_name;
        $subroutine_definition_code .= "\n" . '*' . $package_name . '::__CHECKED_' . $subroutine_name . ' = \&' . $package_name . '::' . $subroutine_name . "\n" . ';';

        $subroutine_definition_code .= "\n" . '*' . $check_code_subroutine_name . ' = sub {' . "\n" . '    my $retval ' . q{ =<<'EOF';} . "\n" . $subroutine_arguments_check_code . "\n" . 'EOF' . "\n" . '};' . "\n";
    };

    eval($subroutine_definition_code) or (RPerl::diag('ERROR ECOPR02, PRE-PROCESSOR: Possible failure to enable type checking for subroutine ' . $package_name . '::' . $subroutine_name . '(),' . "\n" . $EVAL_ERROR . "\n" . 'not croaking'));
    if ($EVAL_ERROR) { croak 'ERROR ECOPR03, PRE-PROCESSOR: Failed to enable type checking for subroutine ' . $package_name . '::' . $subroutine_name . '(),' . "\n" . $EVAL_ERROR . "\n" . 'croaking'; }
```

> If the former, note that we generally regard the perl source filter
> mechanism as deeply flawed, and wouldn't recommend its use in production
> code. The basic problem comes down to the old mantra that "only perl can
> parse Perl". A source filter may correctly parse the source 99.9% of the
> time, but on those 0.1% occasions, it will get something wrong. Maybe it
> will get the start of a // operator confused with the start of a pattern.
> Or maybe it will trip over some complex code embeded into a pattern:
> /...(?{  $x = '/'; }) .../. Or whatever. At that point, code will get
> injected at the wrong point into the source, and the end user gets a
> compile error from code that isn't even in their source, which is
> completely mystifying and hard to debug.

> Switch.pm used source filters, and it was the cause of an endless stream
> of tickets. It was eventually deprecated and removed from core in 5.14.0.

Yes we agree this could be a source of problems.

Rather than using a source filter (or similar mechanism as detailed
above), can you please suggest a more stable and reliable
implementation?

> > Regarding your `my number $x; $x = 'foo';` example, this will
> > require the use of a `tie` or similar mechanism

> Again, this is very vague. Do you actually mean perl's tie / magic
> mechanism? If not, what "similar mechanism" are you proposing to use?

As mentioned, _"in the past, we have always simply allowed the
C(++) compiler to provide this functionality for us; however, this
will have to be explicitly implemented when `Perl::Types` is
refactored into its own distribution."_

Thus, we are not committed to any specific implementation, as long
as we choose a mechanism that allows us to (eventually) handle
_"arbitrarily-nested data structures"_ by _"intercepting any
modifications to internal elements and calling `foo_CHECK()` or
`foo_CHECKTRACE()` for each change."_

> The problem with using actual ties, is that they make the variable very
> slow, even if the actual tie methods such as FETCH() are XS code. That's
> because a lot of code paths in the perl interpreter have optimisations for
> non-magic values. For example, look at pp_add() in pp_hot.c in the perl
> source. This implements perl's '+' operator. Obviously in general the two
> operands can be anything - ints, ,nums, strings, magic values, overloaded
> objects etc, and mixtures thereof. But in pp_add(), the first thing it
> does it check the flags of the two operands, and says:

>     If they are both non magical and non ref, and if both are either
>     simple ints or simple nums, then just add the two values, check they
>     haven't overflowed, set the return value and return. Otherwise go
>     through the complex 200-lines-of-code path which handles magic, mixed
>     types etc.

> So if, for example, an integer variable was marked with get magic, it
> would become a lot slower when being added. Similar considerations apply
> in many places.

> Also, once tie magic starts being applied to arrays and hashes, you start
> to hit edges cases. It's very had to make a tied aggregate behave *exactly*
> like a plain array/hash in all circumstances, even ignoring the slowdown.

Yes we agree this could also be a source of problems, plus slowdowns.

Rather than using `tie`, can you please suggest a more performant
and reliable implementation?

> > So, as mentioned in the original Elevator Pitch, we can "utilize
> > Perl data types to achieve a number of benefits including but not
> > limited to":
> > 
> > * increased performance

> So how would you get increased performance? I've already pointed out that
> magic tends to slow things down, and the cost of calling a check routine
> will slow things down even further.

Adding data types to your Perl source code is the first step on
the path toward compiling your Perl source code, which can provide
anywhere from 10x to 400x (or more) runtime performance increase.
Yes you are correct that enabling type-checking will introduce
runtime overhead to interpreted Perl code, but that slowdown
disappears once your Perl code is fully compiled because the
type-checking is done at compile time rather than runtime.

For those who want type-checking for interpreted Perl code only,
we have done our best to minimize the runtime overhead by allowing
the developer to choose between `foo_CHECKTRACE()` and the slightly
faster `foo_CHECK()`.  As mentioned, _"type checking is currently
controlled on a per-file basis using the `TYPE_CHECKING` preprocessor
directive"_:

https://metacpan.org/release/WBRASWELL/RPerl-7.000000/source/lib/RPerl/Test/TypeCheckingTrace/AllTypes.pm#L1-2

```perl
# [[[ PREPROCESSOR ]]]
# <<< TYPE_CHECKING: TRACE >>>
```

_"The `TYPE_CHECKING` directive can have a value of `OFF` for
disabled, `ON` to call the `foo_CHECK()` macros/functions, and
`TRACE` to call the `foo_CHECKTRACE()` macros/functions.  The only
difference between `ON` and `TRACE` is the inclusion of the offending
subroutine and variable names for easier debugging."_

Furthermore, the type-checking is performed by C macros for scalar
data types, and by C functions (which call those underlying C
macros) for array and hash data structures, which is the fastest
possible solution we could find.  We even re-implemented our original
scalar type-checking from C functions to C macros for increased
performance.

Here is an example of the fast (but now-deprecated) C functions
for checking the `number` type AKA `NV`:

https://metacpan.org/release/WBRASWELL/RPerl-7.000000/source/lib/RPerl/DataType/Number.cpp#L18-38

```c
// TYPE-CHECKING SUBROUTINES DEPRECATED IN FAVOR OF EQUIVALENT MACROS
void number_CHECK(SV* possible_number) {
    if (not(SvOK(possible_number))) {
        croak("\nERROR ENV00, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but undefined/null value found,\ncroaking");
    }
        if (not(SvNOKp(possible_number) || SvIOKp(possible_number))) {
        croak("\nERROR ENV01, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but non-number value found,\ncroaking");
    }
};
void number_CHECKTRACE(SV* possible_number, const char* variable_name, const char* subroutine_name) {
    if (not(SvOK(possible_number))) {
        croak("\nERROR ENV00, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but undefined/null value found,\nin variable %s from subroutine %s,\ncroaking",
                        variable_name, subroutine_name);
    }
        if (not(SvNOKp(possible_number) || SvIOKp(possible_number))) {
        croak("\nERROR ENV01, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but non-number value found,\nin variable %s from subroutine %s,\ncroaking",
                        variable_name, subroutine_name);
    }
};
```
Here is an example of the functionally-equivalent and even-faster
C macros, as mentioned in our first reply:

https://metacpan.org/release/WBRASWELL/RPerl-7.000000/source/lib/RPerl/DataType/Number.h#L137-149

```c
// [[[ TYPE-CHECKING MACROS ]]]
#define number_CHECK(possible_number) \
        (not(SvOK(possible_number)) ? \
                        croak("\nERROR ENV00, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but undefined/null value found,\ncroaking") : \
                        (not(SvNOKp(possible_number) || SvIOKp(possible_number)) ? \
                                        croak("\nERROR ENV01, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but non-number value found,\ncroaking") : \
                                        (void)0))
#define number_CHECKTRACE(possible_number, variable_name, subroutine_name) \
        (not(SvOK(possible_number)) ? \
                        croak("\nERROR ENV00, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but undefined/null value found,\nin variable %s from subroutine %s,\ncroaking", variable_name, subroutine_name) : \
                        (not(SvNOKp(possible_number) || SvIOKp(possible_number)) ? \
                                        croak("\nERROR ENV01, TYPE-CHECKING MISMATCH, CPPOPS_PERLTYPES & CPPOPS_CPPTYPES:\nnumber value expected but non-number value found,\nin variable %s from subroutine %s,\ncroaking", variable_name, subroutine_name) : \
                                        (void)0))
```

For those who want data types in their interpreted code as a form
of annotation only, they can set `TYPE_CHECKING` to `OFF` for even
better performance.  You can always turn it back on later.

```perl
# [[[ PREPROCESSOR ]]]
# <<< TYPE_CHECKING: OFF >>>
```

For those who don't want data types in their Perl code at all...
just don't `use Perl::Types;`, it's that simple.  Your code won't
be affected in anyway, and you can safely continue without any data
types of any kind, just like Perl has always been in the past.

> > * memory safety (bounds checking)

> Can you give an example of where currently perl isn't memory safe, but
> Perl::Types would make it so?

We are not currently aware of any such issues with the Perl
interpreter, and our goal is not to attempt to point out anything
wrong with the interpreter's memory management.  Rather, this is
again primarily related to the need for memory bounds checking for
those who wish to compile their Perl code.

That being said, in the future it may certainly be possible for
the Perl interpreter to utilize data type and data structure
information to implement optimizations in memory and/or performance.

For example, a Perl array can declare its own specific maximum
index (length minus 1) during creation, which is used by the Perl
compiler (and subsequently the C(++) compiler) for standard memory
management purposes.  In the following example, the declared
`arrayref` length is 4, with an `undef` being assigned to the
maximum index of 3 (`4 - 1`) in order to avoid `Useless use of
array element in void context` when `use warnings;` is enabled:

```perl
my integer::arrayref $foo->[4 - 1] = undef;
```

The current experimental syntax for declaring an `array` is very similar, but without the need for `undef` and with the unfortunate need for a temporary `$TYPED_foo` scalar:

```perl
my integer::array @foo = my $TYPED_foo->[4 - 1];
```

The Perl interpreter could potentially utilize the array's maximum
index to more effectively organize allocated memory, reduce calls
to garbage collection, increase efficiency of accessing contiguous
array elements, etc.

As with our previously-discussed pure-Perl `$RETURN_TYPE` syntax,
the current experimental syntax for declaring array lenths may
leave a bit to be desired, but it does not require any change to
the Perl parser and was the best we could come up with so far.
(Also, in the `arrayref` example the Perl interpreter actually
creates an anonymous array of length 4, while in the `array` example
it creates an array of length 1.)

Can you please suggest possible pure-Perl syntax alternatives for
specifying the various maximum indices of an arbitrarily-nested
`array` and `arrayref`, without requiring any changes to the existing
Perl internals?

> > * potential for polymorphism

> Please explain further.

There are many different styles or variations of polymorphism in
computer science.  For the sake of brevity, we will only address
one style here: function overloading AKA ad hoc polymorphism.
These same ideas can likely be extended to include other styles of
polymorhism as well.

For example, imagine the following expansion of our original
`squared()` subroutine example:

```perl
#!/usr/bin/perl
use strict;
use warnings;

use Perl::Types;

sub squared {
    { my integer $RETURN_TYPE };
    ( my integer $base ) = @ARG;
    return $base ** 2;
}

sub squared {
    { my number $RETURN_TYPE };
    ( my number $base ) = @ARG;
    return $base * $base;
}

sub squared {
    { my string $RETURN_TYPE };
    ( my string $base ) = @ARG;
    return $base x (length $base);
}

print squared(23), "\n";
print squared(21.12), "\n";
print squared('howdy'), "\n";
```

Wouldn't it be nice if we got this...

```
529
446.0544
howdyhowdyhowdyhowdyhowdy
```

... instead of this?

```
Subroutine squared redefined at ./sub_test.pl line 14.
Subroutine squared redefined at ./sub_test.pl line 20.
2323
21.1221.1221.1221.1221.12
howdyhowdyhowdyhowdyhowdy
```
What are your thoughts on this kind of polymorphism, function overloading based on argument and/or return value data type(s)?

> Ok, that was the technical side of things. Now on to the policy side.
> 20 years or so ago, perl went through a phase of adding lots of 'useful'
> modules to the perl core. This was widely seen as a mistake.

> First, it made a particular module seem to be officially endorsed.
> Hypothetically, "among several XML parser modules, we chosen this one as
> the best and the one you should use". When design flaws in XML::Parser
> (say) are discovered, and the general recommendation is to use
> XML::BetterParser instead, XML::Parser is still sitting in the perl core
> getting an unfair endorsement. And removing it from core is hard, because
> now lots of people used it, and they're using it because, well, we
> endorsed it!

Our goal is for Perl::Types to be the official data type system of
Perl, in the same exact way that Corinna is the official class
system of Perl.  If a so-called "dual life" distribution is the
wrong way to achieve this, then we're fine with that.

> Second, if modules stop being actively maintained by their author(s) for
> whatever reason, then suddenly we become responsible for maintaining it.
> It's enough work just maintaining the core perl interpreter, without
> taking on extra responsibilities.

Yes we agree, which is why we have formed the Perl::Types Committee
and are in the process of training new Perl::Types developers to
avoid a low bus factor.

> Third, in these days of package managers etc, it's relatively easy for
> people to install the CPAN modules they want. They will also get the
> newest and best version by default, rather than using the old buggy version
> bundled with perl. So there is less good reason to bundle a package with
> perl.

Our goal is for Perl::Types (or its descendent) to be included in
every future version of Perl, in the same exact way that Corinna
will presumably be included in every future version of Perl.  We
are fine with handling versioning in the same way as Corinna,
whatever that may be.

> So these days the policy is not to bundle CPAN modules with perl unless
> they are necessary, usually because they are needed as part of the tool
> chain. There have been very few new modules added to the perl core in the
> last 15 years or so that are capable of acting as standalone CPAN
> distributions instead.

> So the short answer is that even if Perl::Types really appealed to us, it
> is very unlikely that we would agree to bundle it. And I think we're 
> far from a consensus yet that it appeals.

Since you have made a strong case against a dual-life distribution
being included in the Perl core, can you instead please explain
the specific process which Corinna followed to get merged into the
Perl interpreter?

Perhaps Perl::Types can act like Object::Pad, as the experimental
distribution out of which we can derive the desired Perl interpreter
code in the same way that Corinna was derived from Object::Pad?

> On a purely individual note, I am finding Perl::Types very unappealing so
> far.

Can you please give us a specific list of all the things you find
unappealing?

On Behalf of the _Perl::Types Committee_,
Brett Estrade 

--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About