develooper Front page | perl.perl5.porters | Postings from July 2018

[perl #133352] [PATCH] Save and restore captures for BRANCH/TRIEfailures

Thread Next
Deven T. Corzine via RT
July 13, 2018 03:30
[perl #133352] [PATCH] Save and restore captures for BRANCH/TRIEfailures
Message ID:
This is a first pass at a patch to fix RT #133352, to save and restore
all captures during alternations (BRANCH and TRIE) to avoid incorrectly
returning a capture from a failed branch, as happens with the following
test case, which has been broken ever since Perl 5.000 in 1994:

     perl -e 'print "$2\n" if "afoobar" =~ /^((.)foo|bar)*$/;'

Perl 2.0 through Perl 5.0 alpha 9 all print "a" for this command, while
Perl 5.000 through today's blead all incorrectly print "b" instead.

This patch appears to work (making the test case above print "a" again),
and I am submitting it as a preliminary patch for discussion purposes;
I don't consider it ready to apply yet, for a number of reasons:

* I haven't finished analyzing the code to check for edge cases that
  I may have overlooked with this patch.  Since TRIE_next_fail included
  a test for ST.jump which would call UNWIND_PAREN(), I want to better
  understand how that was working and determine whether ST.jump being
  set might have any implications for this patch.

* I haven't developed any new regression tests to exercise this bug and
  the patch to fix it, and I know that both BRANCH and TRIE variations
  need to be tested.

* This patch is currently saving ALL captures, including captures that
  can't possibly change inside the alternation.  This is an easier fix,
  but I believe the performance impact should be mitigated, since this
  would impact many everyday regular expressions which aren't affected
  by this bug in the first place.

* I believe that setting a paren floor (which CURLYX does) would likely
  mitigate most (but not all) of the performance impact of this fix,
  but I haven't yet found a clean way to implement this.

* Another possibility might be to create new regnode types to use only
  for alternations containing capture groups, maybe only for ones which
  also have a quantifier applied, since I believe both are required to
  trigger this bug.  (This might have to be treated as an optimization
  in regcomp.c to use the old logic for alternations that can't trigger
  the bug, but it might be hard to tell if a quantifier is applied for
  qr// cases.)

* Regardless, I would like feedback from an expert on this part of the
  regular expression engine (maybe Yves?) about this patch and whether
  the approach I've taken is on the right track.
 regexec.c | 28 ++++++++++++++++------------
 regexp.h  |  3 +++
 2 files changed, 19 insertions(+), 12 deletions(-)

via perlbug:  queue: perl5 status: open

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About