Front page | perl.perl5.porters |
Postings from October 2016
Re: [perl #129897] Unexpected behavior with a regular expression
Thread Previous
|
Thread Next
From:
demerphq
Date:
October 17, 2016 16:37
Subject:
Re: [perl #129897] Unexpected behavior with a regular expression
Message ID:
CANgJU+X9+7u4RrnZed4HHE4RfGDqDdX7CTDfJnt9ZtnymWBGNw@mail.gmail.com
On 17 October 2016 at 04:50, Zefram <zefram@fysh.org> wrote:
> Jorma Laaksonen wrote:
>>Any hint if I'm doing something wrong or not doing something I should
>>do?
>
> No, that's all supported usage. You're quite right about the behaviour
> being erroneous.
Agreed.
It seems to be a bug about unwinding .*? although it also interacts
with TRIE code in ways I dont entirely understand. (Making the code
not produce a TRIE fixes the bug, but on the other hand, so does
removing the .*?)
Nevertheless I can fix the bug (while possibly introducing new bugs)
with the code in yves/fix_129897
c09f087940c61f3b6e57e7cf5e5b7a4faa683420
I would prefer that Dave have a look into this, as I dont entirely
understand why my patch fixes things for this case, but that in most
other cases it is not needed.
The key point is that when we fail a .*? match we should unwind and
reset any buffers we matched after our current point. But STAR and
PLUS do not initialize the proper member fields so that we can do this
unwinding properly.
I have to admit that this bug is quite surprising. I would have
thought that if we have a bug like this that we fail our regex tests
completely, but apparently not.
Of course, it may have to do with the fact that the form of this bug
is incredibly horrible. Having an unanchored .* at the beginning of a
pattern is a good way to make your regex quadratic on failure. (We may
trigger an optimisation that automagically adds the anchor, and we may
not....)
So it may simply be that most times we dont trigger this bug, but I
admit its not obvious to me why not.
Yves
commit c09f087940c61f3b6e57e7cf5e5b7a4faa683420
Author: Yves Orton <demerphq@gmail.com>
Date: Mon Oct 17 18:29:43 2016 +0200
provisional patch to fix [perl #129897]
diff --git a/regexec.c b/regexec.c
index e9e23f2..0cde487 100644
--- a/regexec.c
+++ b/regexec.c
@@ -7868,6 +7868,8 @@ NULL
case STAR: /* /A*B/ where A is width 1 char */
ST.paren = 0;
+ ST.lastparen = rex->lastparen;
+ ST.lastcloseparen = rex->lastcloseparen;
ST.min = 0;
ST.max = REG_INFTY;
scan = NEXTOPER(scan);
@@ -7875,6 +7877,8 @@ NULL
case PLUS: /* /A+B/ where A is width 1 char */
ST.paren = 0;
+ ST.lastparen = rex->lastparen;
+ ST.lastcloseparen = rex->lastcloseparen;
ST.min = 1;
ST.max = REG_INFTY;
scan = NEXTOPER(scan);
@@ -7900,6 +7904,8 @@ NULL
ST.paren = 0;
ST.min = ARG1(scan); /* min to match */
ST.max = ARG2(scan); /* max to match */
+ ST.lastparen = rex->lastparen;
+ ST.lastcloseparen = rex->lastcloseparen;
scan = NEXTOPER(scan) + NODE_STEP_REGNODE;
repeat:
/*
@@ -8013,7 +8019,7 @@ NULL
/* failed to find B in a non-greedy match where c1,c2 valid */
REGCP_UNWIND(ST.cp);
- if (ST.paren) {
+ if ( 1 || ST.paren ) {
UNWIND_PAREN(ST.lastparen, ST.lastcloseparen);
}
/* Couldn't or didn't -- move forward. */
@@ -8086,7 +8092,7 @@ NULL
/* failed to find B in a non-greedy match where c1,c2 invalid */
REGCP_UNWIND(ST.cp);
- if (ST.paren) {
+ if ( 1 || ST.paren ) {
UNWIND_PAREN(ST.lastparen, ST.lastcloseparen);
}
/* failed -- move forward one */
@@ -8147,7 +8153,7 @@ NULL
/* failed to find B in a greedy match */
REGCP_UNWIND(ST.cp);
- if (ST.paren) {
+ if ( 1 || ST.paren ) {
UNWIND_PAREN(ST.lastparen, ST.lastcloseparen);
}
/* back up. */
--
perl -Mre=debug -e "/just|another|perl|hacker/"
Thread Previous
|
Thread Next