develooper Front page | perl.perl5.porters | Postings from July 2018

[perl #133352] Ancient Regex Regression

Thread Previous | Thread Next
From:
James E Keenan via RT
Date:
July 10, 2018 17:09
Subject:
[perl #133352] Ancient Regex Regression
Message ID:
rt-4.0.24-32749-1531242558-1580.133352-15-0@perl.org
On Mon, 09 Jul 2018 04:04:32 GMT, dcorzine wrote:
> This is a bug report for perl from deven@ties.org,
> generated with the help of perlbug 1.41 running under perl 5.29.1.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> I discovered a bug in Perl's regular expression engine a few
> months ago.  I showed it to many people at The Perl Conference
> in Salt Lake City a couple weeks ago, and everyone agreed that
> this was a bug in the regex engine in Perl itself, including
> Abigail, Tom Christiansen, Karl Williamson and Larry Wall.
> 
> I even ended up doing a lightning talk about the bug:
> 
> https://www.youtube.com/watch?v=U-JhPIECkPY
> 
> This was my test case, which works with or without anchors:
> 
> "afoobar" =~ /((.)foo|bar)*/
> "afoobar" =~ /^((.)foo|bar)*$/
> 
> Or, as a standalone command:
> 
> perl -e 'print "$2\n" if "afoobar" =~ /^((.)foo|bar)*$/;'
> 
> This prints "b", even though "bfoo" never appears in "afoobar"!
> 
> I understand why this is happening -- the inner group does match
> against "b" in "bar" on the second iteration, but this branch of
> the alternation fails.  The capture is still being used, despite
> the fact that it came from a failed branch of the alternation.
> 
> The correct answer seems to be "a", since that's the last match
> of the inner group and the overall match is successful.  Perl 1.0
> can't handle this regex (Larry said it was the regex engine from
> Gosling Emacs), but Perl 2.0 through Perl 5.0 alpha 9 all print
> "a" for the command above.  Other regex implementations, such as
> PCRE, RE2, GNU and others, also return "a" for the inner group.
> 
> Perl 5.000 (from 1994) is the first commit in the git repository
> (commit a0d0e21ea6ea90a22318550944fe6cb09ae10cda) which exhibits
> the bug, printing "b" instead of "a".  I just built blead again
> today and confirmed that the bug is still there, despite passing
> the full test suite.  (Tom Christiansen pointed out that this bug
> is technically a regression, since it used to work correctly.)
> 
> Even though I have never worked on the Perl core, and I've been
> warned that the regex engine is particularly difficult, I would
> still like to attempt to develop a patch for this bug myself.
> 
> I've already managed to create a working patch that fixes this
> bug without breaking any of the regular expression tests in the
> test suite, so I think I'm on the right track, but I think there
> may be a few edge cases to consider, so I'm not ready to submit
> the patch just yet.
> 

Please submit the patch as an attachment.  Even if it's not complete, it gives us something to run through the test suite and a starting point for discussion.

> Yves, SawyerX thought you might be willing to mentor me on this?
> If so, that would be great!
> 
> My solution involves saving the captures with regcppush() on
> BRANCH and TRIE nodes and restoring them with regcp_restore() at
> BRANCH_next_fail and TRIE_next_fail.  Does that sound like the
> right approach, give or take?
> 

Thank you very much.

-- 
James E Keenan (jkeenan@cpan.org)

---
via perlbug:  queue: perl5 status: new
https://rt.perl.org/Ticket/Display.html?id=133352

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About