develooper Front page | perl.perl5.porters | Postings from June 2020

Dumbmatch - a less "smart" smartmatch that DWIW

Thread Next
From:
Paul "LeoNerd" Evans
Date:
June 1, 2020 09:26
Subject:
Dumbmatch - a less "smart" smartmatch that DWIW
Message ID:
20200601102552.6b70a54f@shy.leonerd.org.uk
TL;DR: Syntax designs for a Do-What-I-Wrote version of a "match one
  case in several" control flow syntax.


I am getting closer to a position where I can start working on my
dumbmatch idea; I have some approximate syntax thoughts here. Before I
go over my remaining questions, first consider this synopsis chunk:

  use feature 'match';

  my $var = "some value";

  match($var : eq) {
    # All comparisons are performed using 'eq'

    case("hello") {
      # cases are BLOCKs, not labels.
      say "The value was hello";

      # no need to break/last/...
    }

    case("goodbye") {
      # a different value
    }

    case("goodbye") {
      # cases are tested in listed order, so this block will never
    match as it # is shadowed by the preceding one
    }

    case("foo", "bar", "splot") {
      # A list of possible cases, any may match
    }

    case($SOMEVAR) {
      # cases need not be compiletime constant
    }

    case(@MORE_VALUES) {
      # case(EXPR) is in list context, matches any of the values in the
    list }

    case(1.0) {
      # still equivalent to  $var eq 1.0
      ...
    }

    default {
      # if nothing else matched by now
    }
  }


  # Other match operators are available
  match($num : ==) {
    ... # cases are numbers compared numerically

    case("hello") { }   # warns of non-numeric value in ==
  }

  match($str : =~) {
    # cases are regexp m// matches a.la. $str =~ m/.../

    case(m/^A pattern here/) { ... }
    # note: not qr//
  }

  match($obj : isa) {
    ... # cases are Package::Names
  }


  # We can match on any expression, which is evaluated exactly once
  match(somefunc() : eq) {
    case(1) { ... }
    case(2) { ... }
    case(100) { ... }
  }


  # match does NOT touch $_.

  # Though nothing stops you doing
  match($_ = whatever() : eq) {
    case(123) { say "The value $_ is 123" }
  }

  match(my $topic = EXPR : eq) {
    case(45) { say "The value $topic is 45" }
  }

  # Plus of course you can match $_ itself
  foreach (@items) {
    match($_ : eq) { ... }
  }


Some notes arise from this:

1. I'm not *100%* glued to the specific pieces of the syntax. In
   particular, I've struggled a lot with how to specify the dangling
   infix operator and eventually settled on `match($var : eq)` after
   trying and rejecting

      match($var) on eq
      match($var) using eq  # or any other noiseword
      match($var eq *)
      match($var eq _)      # or any other placeholder symbol

   The syntax as written is the neatest idea I can come up with, which
   still fits in everything I want to be able to support.

   1a. I'm especially not fixed on the idea of reusing the `default`
       keyword for the final block. Equally I could take `otherwise` or
       even `else`, though the latter might feel a bit weird...

2. The syntax as it stands is trivially convertible into "the obvious"
   if/elsif... chain; and indeed in my first implementation I don't
   plan to make any new ops for it. The syntax will start off much like
   the first version of sub signatures - just a neater way to emit the
   existing ops. Over time I imagine various optimised versions of the
   ops can be made which will perform faster than the plain if/elsif
   rewrite.

   2a. Taking note of the final synopsis point about not using $_, a
       rewrite into an if/elsif-like optree would need a temporary
       variable to store the topic value in. The core ops don't
       actually care if the underlying pad slot has a name or not, but I
       know modules like B::Deparse or Future::AsyncAwait will get
       upset in that case. Investigations continue on how best to
       handle this...

   2b. Even if more efficient internal ops were eventually created to
       implement this, care must be taken to ensure that they behave as
       if the regular comparison operator was being invoked as would
       happen in the equivalent if/elsif... code. In particular,
       overloaded operators must be respected. Nothing should prevent a
       nice efficient shortcut of, e.g. a hash table lookup if the
       original syntax contained a thousand literal string cases and
       the topic value really is a plain unblessed scalar string, but
       it should also behave sensibly if objects with overloaded
       operators are found instead.

3. Even though these are trivial rewrites of if/elsif syntax, I still
   want to restrict the set of allowed operators to just the four
   listed here. While the syntax could easily support operators like
   `<` or `+=` or any other infix binary operator, the semantics for
   invoking it would be much less clear. Also, by restricting to only
   these four operators (at least, in a first version) makes it easier
   to provide more efficient implementations later.

4. I have considered whether it would be useful to allow some "undef or
   ..." kind of semantics. I don't have a good answer to that.

   On the one hand it feels like it should be easy to implement:

     match($str : eq) {
       case(undef) { ... }
       case("foo") { ... }
       ...
     }

   and have it "do what I mean", but on the other hand this feels like a
   long slippery slope that ends up back at smartmatch and given/when,
   which is exactly the sort of thing I am trying to avoid here.

   It may be the case that to properly solve this Perl would need to
   obtain some new operators, which would behave like "undef-aware"
   versions of eq, == and =~. Such a solution would be Perl-wide,
   rather than specific to `match`. Further thoughts for another day,
   perhaps...

   4a. Alternatively, we -could- describe the semantics of these as
       saying that if the topic value is undef it is handled specially,
       matching only a `case(undef)` if present, and nothing else:

         case(undef)    # equivalent to  if !defined $topic
         case($value)   # equivalent to  if defined $topic and
                                               $topic eq $value

       These would at least be self-consistent, bounded in definition
       and unlikely to "slippery slope" into smartmatch, but I feel it
       would be harder to explain the difference in behaviour between:

         if(undef == 0) { ... }

         match(undef : ==) { case(0) { ... } }

       Furthermore, this might surprise people who simply convert
       existing if/elsif... code to this new syntax, and find that
       undef now behaves differently to their code was expecting.

5. As a small extension of the `case(LIST, OF, VALUES)` I could see
   allowing a range of values to be specified using list-valued `..`.

     match($num : ==) {
       case(1 .. 9) { say "One digit" }
       case(10 .. 99) { say "Two digits" }
     }

   By limiting the set of allowed operators (see point 3), we can be
   sure we know how to implement this. Because we know we're operating
   on ==, we know we can rewrite this as some tests involving `<=`

      if(1 <= $topic and $topic <= 9) { say "One digit" }

   This would be on the same sort of level as the optimisation of
   `foreach (1 .. 100000)` where we store current/end index rather than
   the full list of values.

   This extension naturally fits the `eq` test as well, though there
   isn't an equivalent for the `=~` or `isa` tests. Perhaps this is OK?

6. Because the comparison test operator is specified as part of the
   `match` expression that means there is only one used for the entire
   list of cases. At first this feels sensible enough - it's rare you'd
   ever want to do both numerical and stringy comparisons in the same
   block - it starts to feel a bit restrictive if you want to combine
   both regexp and string equality tests in one place.

      if   ($x eq "foo")   { ... }
      elsif($x =~ m/^bar/) { ... }

   Would have to become:

      match($x : =~) {
        case(m/\Afoo\Z/) { ... }
        case(m/^bar/   ) { ... }
      }

   which loses some clarity.

   A further situation of mixed comparisons which is an odd quirk of
   how things have turned out in Perl, is handling various tests around
   $@, the caught exception, in a try/catch block. Since this situation
   is related to try/catch, and forms part (most?) of my motivation for
   wanting to have this dumbmatch syntax in core Perl in the first
   place, I shall discuss it in its own subthread.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk      |  https://metacpan.org/author/PEVANS
http://www.leonerd.org.uk/  |  https://www.tindie.com/stores/leonerd/

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About