develooper Front page | perl.perl5.porters | Postings from September 2011

Re: [perl #92898] (*THEN) broken inside condition subpattern

Thread Previous | Thread Next
From:
Philip Hazel
Date:
September 20, 2011 03:23
Subject:
Re: [perl #92898] (*THEN) broken inside condition subpattern
Message ID:
alpine.LNX.2.00.1109191722410.5131@quercite.quercite.com
On Sun, 18 Sep 2011, Father Chrysostomos via RT wrote:

> So you are saying that (?(condition)foo(*THEN)bar|baz) should jump out
> of the conditional group (since the |baz part is not a backtracking
> point, but is only reached when the condition is false), but that
> (?(condition)foo(*THEN)bar) should fail the whole pattern (there being
> no |bar)?

No, I'm not. 

> I think it ends up being too confusing.  The | in a conditional has
> nothing to do with regular alternation.  That view is reinforced by the
> fact that only one pipe is permitted:

That is certainly true, but to me, as a simple-minded person, it *looks* 
like a regular alternation. The only difference is that the matching 
engine just tries one of the alternatives rather than both. Consider

/^.*?(?(?=a)a(*THEN)b|c)/      Pattern
ac                             Subject
    
It starts off trying with zero matches of "a". The condition is true, so 
it matches a, fails on b, and then backtracks to (*THEN). In a "normal" 
group it would try the next alternative, but a conditional group behaves 
as if there is only one alternative, so it should just backtrack as if 
the group had failed, thereby trying again with one "a" matching .* and 
so eventually succeeding.

I think the same should happen for this example:

/^.*?(?(?=a)a(*THEN)b)c/
ac

Further investigation shows up another issue. If (*THEN) appears in a 
regular (non-conditional) group that has no alternatives, its effect
again extends beyond the group.

/^.*?(a(*THEN)b)c/
aabc

Perl gives "no match"; PCRE currently matches. However, if we give it 
a dummy alternative:

/^.*?(a(*THEN)b|z)c/

then Perl (5.012003) does match. That seems very counter-intuitive to 
me. Perhaps, however this does tie in with the way Perl handles 
conditional groups, since they seem to have the same behaviour.

The text in perlre for *THEN says "when backtracked into on failure, it
causes the regex engine to try the next alternation in the innermost
enclosing group". It doesn't say what happens if there are no 
alternations or indeed if *THEN occurs in the final alternation. A check 
with

/^.*?(z|a(*THEN)b)c/

shows that Perl does match in this case too.

> > While thinking about this and experimenting, I've just discovered
> > another oddity of (*THEN).
> > 
> > Pattern: /a+?(*THEN)c/
> > Subject: aaac
> > Result:  Perl 5.012003 matches "aaac" 
> 
> That’s strange. In 5.14 it doesn’t match. I don’t know which is worse.

I sometimes wonder whether these new backtracking verbs are going to 
prove more trouble than they are worth.

Regards,
Philip

-- 
Philip Hazel
Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About