develooper Front page | perl.perl5.porters | Postings from October 2021

Pre-RFC: Optional Chaining

Thread Next
October 28, 2021 17:39
Pre-RFC: Optional Chaining
Message ID:
Hello, Porters!

I am super excited to bring you this pre-RFC feature request, and humbly
ask you to review and let me know whether it has what it takes to go
further up the RFC road.

I have tried my best to properly fill all fields from the RFC template even
though I understand this is just the very first stage. Still, I think it
would be a great addition to the language - I'd certainly use it everywhere.

Thank you!

# Optional Chaining

## Preamble

    Author:  Breno G. de Oliveira <>
    Status:  Draft

## Abstract

This RFC proposes a new operator, `?->`, to indicate optional dereference

## Motivation

Chained dereferencing of nested data structures is quite common in Perl
programs. However, and specially due to Perl's dynamic type system, there
times when you must check whether the data exists, is defined and holds a
reference before you can dereference it, otherwise the call may trigger a
runtime exception.

The current syntax for these verifications can be quite long, becoming even
longer the more nested verifications need to be done, making it not only
harder to write and read, but also more prone to human error.

## Rationale

An "optional chaining" operator would let us access values located deep
a chain of conected references and fail gracefully without having to check
that each of them is valid. This should result in shorter, simpler and more
correct expressions whenever the value being accessed may be missing, or
exploring objects and data structures without guarantees of which
keys/methods/indexes/accessors are provided.

Equivalent solutions have been thoroughly validated by many other popular
programming languages, which already provide this feature going back since
at least 2015. It is sometimes also referred to as a "safe call", "null
"safe navigation" or "optional path" operator.

JavaScript, Kotlin, C#, Swift, TypeScript, Groovy, all provide "?.", which
behaves, to the best of my knowledge, exactly as proposed here. Ruby also
does it, but uses "&.". PHP does it with "?->" like what we're proposing.

Raku implements it as ".?" instead of "?." to denote that, in its case,
it is checking whether the invocant has the righthand side method, not
the invocant itself is defined.

Rust provides the "?" operator to collapse Result objects into
either their Ok value or their Err value
(e.g. File::open("hello.txt")?.read_to_string(&mut s)?)
This behaves somewhat similarly to what is being proposed here.

Python, Java and C++, notably, do not provide it yet - though PEP505 has
proposed for the former.

Another noteworthy mention is Objective-C, in which every chained call
automatically short-circuits to NULL without requiring a special operator.

## Specification

The `?->` operator would behave exactly like the current dereference arrow
`->`, except that, instead of causing a runtime error if the lefthand side
is undefined or not a reference, it would shortcut the whole expression
to `undef` in scalar or void context, and an empty list `()` in list

As with `->`, spaces would be allowed before and after the `?`, so
`$obj?->method` can be written as `$obj ?-> method` or even `$obj ? ->

The `?->` operator would interact with the exact same things that the `->`
operator does and should be completely interchangeable with it, albeit
providing the short-circuit.

## Backwards Compatibility

This would not conflict, to the best of my knowledge, with any other syntax
in Perl 5. All code with `?->` currently yields a compile time syntax error,
except inside interpolated strings (where `"$foo?->{bar}"` resolves to
showing the reference address followed by a literal `?->{bar}`) and regular
expressions (where `?` acts as a special character and `/$foo?/` gives no
warnings, though I am not entirely sure of its purpose). These would need to
be updated to handle optional chains, even though it should be pointed that
trying to use `undef` inside a string or a regexp already yields an
`Use of uninitialized value` runtime warning, and would continue to do so
with the optional chain operator.

Outside the interpreter, since this would be a new operator, I am not sure
how hard it would be for static tools and modules to incorporate it.
considering it would of course go through the whole 'use experimental' and
'use feature' process, I expect it to be about as hard as it was for the
postfix dereference implementation on those same tools, and maybe that can
used as a ballpark estimate.

Finally, because it involves a special token (`?`), I don't think it is
possible to emulate it for earlier Perl versions via a module, unless it's a
source filter.

## Security Implications

None foreseen.

## Examples

Here are a few use case examples, with their current Perl 5 equivalent
in the comments:

    no warnings 'experimental::optional_deref';  # or something.
    use feature 'optional_deref';

    $x = $obj?->method;          # $x = ref $obj ? $obj->method : undef;
                                 # or $x = $obj->method if ref $obj;

    $x = $obj?->method?->other;  # my $tmp = ref $obj ? $obj->method :
                                 # then $x = $tmp->other if $tmp;
                                 # Note that in this case we need a
                                 # storage to avoid calling ->method twice.

    $x = $href?->{somekey};      # $x = ref $href ? $href->{somekey} :

    $x = $href?->{foo}?->{bar};  # $x = ref $href && ref $href->{foo}
                                 #    ? $href->{foo}{bar} : undef;

    # NOTE: I'd rather write the statement below as
    # but it may be harder for the compiler (see "Open Issues" below):
    if ($href?->{foo}?->{bar}?->{baz} == 42)   # if (   ref $href
                                               #     && ref $href->{foo}
                                               #     && ref
                                               #     &&
                                               #     == 42)

    $x = $subref?->(42);   # $x = ref $subref ? $subref->(42) : undef;

    $x = $a?->[3]?->[0];   # $x = ref $aref && ref $aref->[3] ?
                           #      : undef;

    # A notable exception that does not use 'ref':
    $x = SomeNamespace?->new;   # $x = %SomeNamespace:: ? SomeNamespace->new
                                #    : undef;

    # attribution would, of course, also be allowed:
    $href?->{foo}?->[3] = 'OHAI!';   # $href->{foo}[3] = 'OHAI'
                                     #     if ref $href && ref $href->{foo};

    # postfix dereferencing:
    %x = $href?->%*;   # ref $href ? $href->%* : undef;
    $x = $aref?->$#*;  # ref $aref ? $aref->$#* : undef;

    my $aref; say foreach $aref?->@*;  # no loop, no warnings.
                                       # behaves the same as foreach

    @x = $aref?->@*;   # @x is now (), not (undef). Equivalent to:
                       # @x = ref $aref ? $aref->@* : ()

    $x =~ s/\d+/$href?->{foo}/e;     # $x =~ s/\d+/ref $href ? $href->{foo}
                                     #    : undef/e;

## Prototype Implementation

I think it may be possible to implement this with a source filter, but
I have not attempted to do so.

## Future Scope

Future versions may be cleverer and do extra checks on the statement
considering both sides of the operator, thus making it even more useful.

For example, `$href?->{a}?->{b}?->[3]?->{d}` is, according to this proposal,
equivalent to:

       ref $href
    && ref $href->{a}
    && ref $href->{a}{b}
    && ref $href->{a}{b}[3]
    ? $href->{a}{b}[3]{d} : undef;

But it could be made equivalent to something like:

       ref $href eq 'HASH'
    && exists $href->{a}
    && ref $href->{a} eq 'HASH'
    && exists $href->{a}{b}
    && ref $href->{a}{b} eq 'ARRAY'
    && length(@{$href->{a}{b}}) >= 4
    && ref $href->{a}{b}[3] eq 'HASH'
    && exists $href->{a}{b}[3]{d}
    ? $href->{a}{b}[3]{d} : undef;

Similarly, `$x = $obj?->a?->b` could be made equivalent to something like:

    my $tmp = ref $obj && blessed($obj) && $obj->can('a') ? $obj->a : undef;
    $x = ref $tmp && blessed($tmp) && $tmp->can('b') ? $tmp->b : undef;

None of these potential future updates would interfere with the proposed
syntax, just with how far Perl would be willing and able to safeguard it.

Another potentially interesting area of scope would be to allow for defining
a custom default value other than `undef`. This could be achieved by
offering similar syntax to the ternary operator, but it goes beyond the
scope of this RFC.

## Rejected Ideas

I am unaware whether this feature has already been proposed for Perl 5
in the past.

We could, potentially, achieve the same result by making `undef` respond to
any calls as `undef`, much like what Objective-C does, so `$x->{a}[3]{b}`
becomes `undef->{a}[3]{b}` then `undef->[3]{b}` then `undef->{b}` then
`undef`. The problem with this approach is that it completely eliminates the
runtime exception of trying to use undef as a reference, which may not be
what the developer wants. It would, in fact, potentially create problems for
existing programs that count on that runtime error to happen. Instead, I
believe it's better to make it explicit.

I have considered going the path of Raku and proposing the operator as `->?`
instead of `?->` but having the `?` closer to the invocant feels more like
what developers may expect, not only for its similarities with
in other languages but also because it reads like a ternary `?:`, which may
make it easier to read and glance over. In other words, I believe `$x?->y`
reads more like the intended behavior, whereas `$x->?y`, to me, reads like
Perl dereferenced the invocant before the check. Finally, postderefs would
even harder to read, e.g.: `$aref->?$#*`.

I have also considered using other token(s) for this, but having the `->` to
signal the dereference is something all Perl 5 developers have come to
and the `?` is the obvious choice not just for its similarity with other
implementations, but because of its behavioral similarities with the ternary

## Open Issues

* Would we be able to unambiguously omit the arrow in chained references?

I think, at least for an initial implementation, the arrow needs to be
mandatory for optional chaining even in cases where the lone arrow can be
omitted (notably, chained hash/array references), since allowing `?`
followed by a lot of other tokens could, potentially, become ambiguous with
ternary ops. That said, full compatibility would be great, and I have not
investigated enough to prove said ambiguity. In other words, it would be
to be able to write `$x?->{foo}?{bar}?{baz}` instead of
`$x?->{foo}?->{bar}?->{baz}`. Maybe in a future version?

* Other identity values?

I believe short-circuiting to `undef` in scalar context and `()` in list
context is a good start. Still, there may be other interesting "identity"
values to be returned depending on the variable being dereferenced or maybe
even its context and surroundings, like for example short-circuiting to the
empty string when being interpolated. Those may not be so obvious (what is
identity value for a subref, or a globref?) and therefore it would need to
done in a case by case manner, very thoroughly considering each one in order
to keep the syntax consistent and predictable, not magical.

## References


## Copyright

Copyright (C) 2021, Breno G. de Oliveira

This document and code and documentation within it may be used,
and/or modified under the same terms as Perl itself.


Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About