develooper Front page | perl.perl6.compiler | Postings from May 2018

[perl #130910] [REGEX] Backtracking into a parameterized subrulelike `<meh(42)>` tries to call it without arguments.

Brian S. Julin via RT
May 8, 2018 18:40
[perl #130910] [REGEX] Backtracking into a parameterized subrulelike `<meh(42)>` tries to call it without arguments.
Message ID:

This is also an issue in nqp.

$ nqp -e 'grammar f { regex TOP { ^ <foo(42)> $ }; regex foo($i) { .. } }; nqp::say(f.parse("aaa"));'
Too few positionals passed; expected 2 arguments but got 1

Fixing it in nqp first is probably the best first step.  To
that end I investigated some and it looks like this will require
some fairly tricky modifications.

Currently, a Cursor will fill in its $!regexsub parameter by getting the
callercode of the rule that called a .cursor_start_* method.  This code
has the param checking instructions at the top.  Then when the cursor
is matched it copies this code reference into $!restart in .cursor_pass.
Then the regex node code (made by .regex_mast which is called by .as_mast
which simply inserts the .regex_mast instructions inline with the rest
of the code .as_mast generates) will call cursor_next when backtracking.

If it finds code in $!restart, .cursor_next invokes it with no arguments.
The .as_mast code will skip calling the .regex_mast code when invoked
with a function pointer in $!restart so it will only unwind the
cursor stack (based on the backtrack stack).  However, the code
to check the parameter count is before the as_mast code in the
frame and gets hit before it gets there.  You can see this behavior
as such by making the positional optional:

$ nqp -e 'grammar f { regex TOP { ^ <foo(42)> $ }; regex foo($i?) { {nqp::say($i)} .. } }; f.parse("aaa");'

...noting that the 42 is only said once on the first call where the
match occurs, not on the second call during the backtrack.

There is also a cursor_more in NQP which seems to be unused in NQP, which
will call $!regexsub with nothing but a new cursor as a parameter.

In rakudo, cursor_next and cursor_more are replicated under different
names, along with an additional one used for exhaustive/overlapping, and then
renamed pointers to those functions are thrown into a grist mill of
code where it is hard to enumerate the number of places in which they
are called.

Long story short, it does not look like passing args along down
the call chain is practical.  Either some way to move the param checks
for everything but the invocant down into the regex_mast instructions,
or to take a curry closure around the params and put that in regexsub
instead would be required.

Worth noting as a side note, it has been expressed before that having
a way to fire a phaser (or code somehow otherwise attached) when a
block in a regex is backtracked over would be useful in building some
interesting constructs.  It is speculated in S04/"Definition of Success"
that a block that gets backtracked over should fire UNDO
(which implies that KEEP would not be fired until the whole match succeeds.)
I would guess we would only want to keep half-finished frames around
to do that when there actually were user-defined phasers to fire,
for performance reasons.  Also any block where the return value is
used for interpolation or assertion would obviously not be compatible
with this premise. Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About