develooper Front page | perl.perl5.porters | Postings from August 2017

Re: (?(DEFINE)) and captures.

Thread Previous
From:
demerphq
Date:
August 17, 2017 15:39
Subject:
Re: (?(DEFINE)) and captures.
Message ID:
CANgJU+U1o8Gwaxdbt3pJOWB6TaHE-XhGET_nC0ZJrqHrJN2Ncw@mail.gmail.com
On 14 Aug 2017 18:04, "Abigail" <abigail@abigail.be> wrote:


Here's a pattern which uses a (?(DEFINE)) construct to define a
couple of named rules. Each such rule must consist of a named
capture.


    my $pat_1 = qr {
        (?(DEFINE)
           (?<foo>  foo)
           (?<bar>  (?&foo)))

        (?&bar)
    }x;


Now, if we perform a match with them, do we get captures? If you
don't know the answer, don't worry, Perl itself doesn't quite
know what to do it with it either:


    "foo" =~ /$pat_1/ or die "No match";

    printf "Got %d different names for captures\n" => scalar keys %-;
    printf "Got %d named captures\n"               => scalar keys %+;
    printf "Got %d captures\n"                     => scalar @{^CAPTURE};

    __END__
    Got 2 different names for captures
    Got 0 named captures
    Got 0 captures


Now, let's see what happens if we add one more set of capturing
parenthesis to the pattern:

    my $pat_2 = qr {
        (?(DEFINE)
           (?<foo>  foo)
           (?<bar>  (?&foo)))

        (
            (?&bar)
        )
    }x;


    "foo" =~ /$pat_2/ or die "No match";

    printf "Got %d different names for captures\n" => scalar keys %-;
    printf "Got %d named captures\n"               => scalar keys %+;
    printf "Got %d captures\n"                     => scalar @{^CAPTURE};

    __END__
    Got 2 different names for captures
    Got 0 named captures
    Got 3 captures


We went from 0 captures to 3! That is, $1 and $2 are undefined, while
$3 is set (to "foo"). No named captures are available though.


I'm sure there's a perfectly good explaination if you know the details
of the implementation, but how do we explain this to users?


Depends on what part you want to explain.

Regex subroutines behave the same as regexp eval, that is as a completely
independent subexpression. This is why what they capture is not available
after the match.

The reason adding the third capture buffer changes things is because it is
evaluated without recursion and therefor gets populated, and since it's the
third capture buffer you then "see" the other two. When it is omitted
however no top level capture buffer is populated and it looks like the
first two buffers are missing.

You can see similar effects with patterns omitting named capture and
recursion, @CAPTURE will contain only enough elements to store the last
capture buffer that was already used in the match. A pattern could have
dozens of unused captures of which most will be invisible to @CAPTURE if
they were not actually populated.

I'm on my phone so it's a bit hard to give a really good explanation with
examples. Hopefully this is good enough to move forward with.

Also I would be open to arguments that some or all of this should work
differently. Suggestions welcome!

Yves

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About