develooper Front page | perl.perl5.porters | Postings from June 2021

Re: RFC: Multiple-alias syntax for for

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
June 10, 2021 07:28
Subject:
Re: RFC: Multiple-alias syntax for for
Message ID:
20210610072816.GW16703@etla.org
On Thu, Jun 10, 2021 at 01:43:43AM +0200, Nicolas Mendoza wrote:
> 
> Den 08.06.2021 13:20, skrev Nicholas Clark:
> > So, the plan for discussing the proposed RFC process was to feed an idea
> > through it, and see how we get from idea to RFC to implementation.
> > (Assuming that we don't reject the idea.)
> > 
> > About two months ago Rik had mentioned to me the idea of implementing this
> > (currently illegal) syntax to iterate over hashes:
> > 
> >      for my ($key, $value) (%hash) { ... }
> 
> I'm sorry if this is the wrong place to comment, but I see several comments
> about the proposition itself, so here goes:

No, this is totally the right place to comment.

It's sensible comments like this (particularly comments from people whose
names I don't recognise) that we'd like to see.

> I really like this, it makes life simpler.

That was what I thought when I first saw the idea. But I'm not sure if I'm
biased. :-)

On Wed, Jun 09, 2021 at 09:37:42PM -0400, Dan Book wrote:
> On Wed, Jun 9, 2021 at 7:44 PM Nicolas Mendoza <mendoza@pvv.ntnu.no> wrote:
> 
> >
> > 1) since it looks like such an obvious improvement but hasn't been done
> > before, is there some reason that one hasn't gone down this route
> > before? Are there some hidden gotchas about it? Or is it just genius
> > overlooked simplification?
> >
> 
> I suspect no proposal has gotten far enough in implementation yet to be
> usefully discussed.

I really don't know.

I'm going to guess that it's a few things

1) historically to get something implemented required some luck - it needed
   an idea to arrive with someone who stood a chance of implementing it
   (this is still true, I guess)
   And most of those people already have a bunch more ideas that they'd like
   to try than time to do it. So
   i)  either *they* have to have this great idea and start work on it
   ii) someone else has to have an idea *so* great that it's better than any
       of their ideas, and they drop their other free-time plans

2) I re-read the previous discussions - as ever, some people said
   "I don't think that this is useful enough" whereas others said
   "that seems cool", but there wasn't an "executive" decision made to
   conclude the discussion. So it hangs as "maybe".
   In particular there isn't an affirmative "if you did put the effort into
   trying to implement this, it will be accepted". A policy decision isn't
   made - any work you do is at risk of being ignore

3) When Rik mentioned the basic idea to me about two months ago, he knew that
   the syntax was "low risk" - it fits in nicely and is currently an error.
   I was far more enthusiastic than he expected because I had some idea that
   the runtime was *also* "low risk" - get the internal data structures right
   and code changes would only be needed in one OP. So it looked "easy", and
   seemed like a good surprise to launch a future (likely) 5.36.0 with,
   given that we expected 5.34.0 to be a bit low key.

which is how we ended up with me hacking up implementation over a weekend
and a bit, about 6 weeks ago. And then we had to sit on it...


During which time I figured out that (1) I really wanted an RFC process
(2) Heck, there were still some unanswered questions in what I did, and
given that we'd have to answer them anyway, why not *use* this idea to
test the RFC process, and see if it can answer the questions.

You've actually sort of hit on one of the questions (well, one of the
implementation assumptions I made).

> > 3) will it work transparently with foreach (I believe I saw some
> > comments about that)
> >
> 
> Not sure what you mean; this is a feature specifically for foreach.

Other folks know (or at least understand) the grammar better than me.
I *think* that `for` and `foreach` are aliases for the same thing. In that,
they are two spellings of the same keyword and once they have been parsed
they are identical.

I admit some guilt and inconsistency here - *I* preferred using `for` in
the code examples because it's 4 characters shorter.

*but* I realised that that makes the title `Multiple-alias syntax for for`
which is daft. So I changed *that* to `foreach`. And now it's inconsistent.
Bad programmer, no cookie.

I don't know what is best.

On Thu, Jun 10, 2021 at 04:00:45AM +0200, Nicolas Mendoza wrote:
> 
> Den 10.06.2021 03:37, skrev Dan Book:
> > 
> >     2) will this work flawlessly with for instance: for my ($key $value,
> >     %rest) = (%hash) { … } (iterating only once)
> > 
> > 
> > I don't see why this should be supported and complicate
> > the implementation, since you can just do that assignment without any
> > loop.
> 
> I see I wrote the wrong syntax, I meant that since it is supposed to work
> n-at-a-time, what would it do when having a hash or an array on the left
> side. Would it behave similar to constructs not in a for-loop like classical
> argument assignment (my ($self, $in, %opt) = @_; or just die?
> 
> for my ($key, $value, %rest) (%hash) { … } # one iteration or syntax error?

Syntax error.

> for my ($first, $second, @rest) (@array) { … } # one iteration or syntax
> error?

Syntax error.

> for my ($a, $b, $c) (@array) { … } # int($#array / 3) + ($#array % 3)
> iterations?

This is a question that has to be decided...

What happens here if the list count isn't an integer multiple of 3?

To me, the most obvious answer was substitute undef if it's not
(ie don't die, and don't ignore what would be incomplete 'tuples' at the end)

So if the list has 10 elements, it iterates as

    One     Two     Three
    Four    Five    Six
    Seven   Eight   Nine
    Ten     undef   undef

and with 11, that last iteration is

    Ten     Eleven  undef

> for my (@a) (@array) { … } # one iteration or syntax error

Syntax error. You can only have scalars

> for my ($a, $b, undef) ((1,2,3)) { … } # one iteration or syntax error--

Right. This is the interesting question...

I'm suggesting syntax error.

I think the slightly better question would be express it as

    for my ($a, undef, $c) (1 .. 9) { ... }
    

It's a reasonable generalisation of list assignment, where you can assign to
undef to mean "ignore this value". I can see that this *might* be useful.
It's also safe syntax to implement (although I didn't try, and I'm not sure
how hard it would be).

I'd like to stick to forbidding this, because it makes the implementation
harder.


So, the sort of dirty secret/insight...


The thing I knew before I started was that the code to "get the next
iteration item or terminate the loop" is fairly simple, and really the
only thing that needs changing to go from "1 at a time" to "n at a time".

Also, one might think that Perl internally is a stack machine, but it's
actually sort of hybrid, as many ops can access the Pads directly.
Pads are arrays of (pointers to) SVs - ops store the index of an entry
in the Pad, so arguably this is kind of a register machine.

Rik knew that the syntax only makes sense for a list of lexicals, because
the `my` at that point is a syntax error.

The background is that the perl parser has to know what it is parsing as it
parses it. This syntax:

    for ($key, $value) (%hash) { ... }

                       ^
isn't going to work, because at this point the parser things that it has
parsed the list *that is being iterated over* (and done stuff that has
committed itself to this interpretation), meaning that when it sees '('
and not '{' it can't backtrack and change its mind - this is beyond it.

So, that `my (` is all key here - it's currently a syntax error, but making
the `(` legal after `my` lets the grammar know that what is next is a list
of iterator targets.

[The above part is stuff other people have explained to me.]

The insight I had was that *if* the only thing you allow for "n at time"
iteration is a list of *newly declared lexicals*, then they all sit in
adjacent pad slots.

Right now, this syntax:

    for my $key (keys %hash) { ... }

allocates a pad slot for $key, and stores the numeric index of that pad slot
in (actually) the enteriter op. (This was a surprise - the more logical
place would seem to be the iter op):

$ ./perl -Ilib -MO=Concise,-exec -e 'for my $key (keys %hash) { ... }'
1  <0> enter v
2  <;> nextstate(main 1 -e:1) v:{
3  <0> pushmark sM
4  <$> gv(*hash) s
5  <1> rv2hv[t2] lKRM
6  <1> keys[t3] lKM/1
7  <{> enteriter(next->c last->f redo->8)[$key:2,5] vK/LVINTRO
d  <0> iter s
e  <|> and(other->8) vK/1
8      <;> nextstate(main 4 -e:1) v
9      <0> pushmark s
a      <$> const(PV "Unimplemented") s
b      <@> die vK/1
c      <0> unstack v
           goto d
f  <2> leaveloop vK/2
g  <@> leave[1 ref] vKP/REFC
-e syntax OK


OK, so, if you permit 3-at-a-time iteration, at first glance that seems to
mean that you now need to store 3 pad slots, not 1, and where is there space
to "hide" 2 more integers?

But as long as you are declaring $n lexicals and iterating over $n lexicals,
then you have them in the next slots (ie (1 .. $n - 1)), meaning that you
now only have to find space to store one other integer - it's the "count of
how many we are iterating" - the addresses of the variables to use as
iterators can all be calculated from the known location of the first
variable.

So, it's a lot easier to implement (and a bit faster too, and less likely to
be buggy) if we don't permit undef in the iteration list. And I'd prefer to
keep it that way, at least to start with.


So thanks for asking the right question...

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About