develooper Front page | perl.perl5.porters | Postings from November 2010

Regular expression substitution matching all possible matches

Thread Next
From:
Ed Avis
Date:
November 3, 2010 06:38
Subject:
Regular expression substitution matching all possible matches
Message ID:
loom.20101103T142012-957@post.gmane.org
This is a question about perl regexp usage.  I hope it's okay to post it here;
it seemed too advanced for the beginners list.  It may lead on to a possible
feature request for the perl core.

I would like to perform match-and-substitute finding all possible matches of
a regular expression within a string.  By 'all possible matches' I mean all
productions from that regular expression, not just the production that perl's
regexp engine happened to find first.  So given the regexp /\Ax*/ and the
string 'xxx', I would expect to see matches '', 'x', 'xx', and 'xxx'.

At first sight it appears that the /g flag will provide this, but in fact it
just tells perl to keep matching once it has consumed some of the input string,
while I want it to keep backtracking and finding more possible matches even
after a success.

The perlre manual page suggests that using (*FAIL) together with ?{} can do it:

    'xxx' =~ /\Ax*(?{say $&})(*FAIL)/;

This gives the expected result.  Great!  But now how to do a s/// substitution?
I would like to find all possible matches and get all resulting substitutions.
So far I have hand-rolled code for that in perl:

    my $string = 'xxx';
    my @m;
    $string =~
/\A(x*)(?{push @m, [ ${^PREMATCH}, ${^POSTMATCH}, [ @- ], [ @+ ] ]})(*FAIL)/p;

    my $replacement = '.$1.';
    foreach (@m) {
        my ($pre, $post, $starts, $ends) = @$_;

        my @captured;
        foreach my $n (1 .. $#$starts) {
            my $start = $starts->[$n] // die;
            my $end = $ends->[$n] // die;
           push @captured, substr($string, $start, $end - $start);
        }

        (my $r = $replacement)
            =~ s{\$(\d+)}{$captured[$1 - 1] // die}ge;
        say $pre . $r . $post;
    }

This appears to work but it is unsatisfactory.  Can I coax perl's regexp
engine into finding all possible matches and gathering them into an array?

Would there be an appetite for adding a new regexp flag to find all possible
matches?  It would be a useful teaching tool, and would also make it possible
to find the longest or shortest match.

-- 
Ed Avis <eda@waniasset.com>




Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About