develooper Front page | perl.perl5.porters | Postings from January 2013

Re: [perl #116537] Regex: (*THEN) doesn't work as described

Thread Previous
From:
Brad Gilbert
Date:
January 25, 2013 22:56
Subject:
Re: [perl #116537] Regex: (*THEN) doesn't work as described
Message ID:
CAD2L-T2gJp7Q5r7BXgUZ_9TD0umhRqRNTmNGQ_AkGWgmxsk3ig@mail.gmail.com
On Fri, Jan 25, 2013 at 4:17 PM, Ronald J Kimball <rjk@tamias.net> wrote:
> On Fri, Jan 25, 2013 at 09:23:54AM -0800, Philip Hazel wrote:
>
>> My understanding of how (*THEN) works is that the test below should
>> match. The perlre page says "...this verb always matches, and when
>> backtracked into on failure, it causes the regex engine to try the next
>> alternation in the innermost enclosing group (capturing or otherwise)
>> that has alternations." Unless I am going mad, the examples below (one a
>> normal group, the other an assertion) fulfil the condition.
>>
>> $ perl -e 'print (("ac" =~ /^(?=ab|ac)/)? "yes\n":"no\n")'
>> yes
>> $ perl -e 'print (("ac" =~ /^(?=a(*THEN)b|ac)/)? "yes\n":"no\n")'
>> no
>>
>> $ perl -e 'print (("ac" =~ /^(ab|ac)/)? "yes\n":"no\n")'
>> yes
>> $ perl -e 'print (("ac" =~ /^(a(*THEN)b|ac)/)? "yes\n":"no\n")'
>> no
>
> These work in 5.10.1, but not in 5.14.1.
>
> These are the only tests involving (*THEN) that expect a successful match,
> from t/re/pat_advanced.t:
>
>     {
>         #Mindnumbingly simple test of (*THEN)
>         for ("ABC","BAX") {
>             ok /A (*THEN) X | B (*THEN) C/x, "Simple (*THEN) test";
>         }
>     }
>
> The key difference seems to be that in your tests, the two alternations
> begin with the same character.

This appears to be caused by the TRIE optimization (as far as I can tell)

    $ perl -Mre=debug -e'print (("ac" =~ /^(a(*THEN)b|ac)/)? "yes\n":"no\n")'

    Compiling REx "^(a(*THEN)b|ac)"
    Final program:
       1: BOL (2)
       2: OPEN1 (4)
       4:   TRIE-EXACT[a] (14)
            <a> (7)
       7:     CUTGROUP (9)
       9:     EXACT <b> (14)
            <ac> (14)
      14: CLOSE1 (16)
      16: END (0)
    anchored(BOL) minlen 2
    Matching REx "^(a(*THEN)b|ac)" against "ac"
       0 <> <ac>                 |  1:BOL(2)
       0 <> <ac>                 |  2:OPEN1(4)
       0 <> <ac>                 |  4:TRIE-EXACT[a](14)
       0 <> <ac>                 |    State:    1 Accepted: N Charid:
1 CP:  61 After State:    2
       1 <a> <c>                 |    State:    2 Accepted: Y Charid:
2 CP:  63 After State:    3
       2 <ac> <>                 |    State:    3 Accepted: Y Charid:
0 CP:   0 After State:    0
                                      got 2 possible matches
                                      TRIE matched word #1, continuing
       1 <a> <c>                 |  7:  CUTGROUP(9)
       1 <a> <c>                 |  9:    EXACT <b>(14)
                                          failed...
                                        failed...
    Match failed
    no
    Freeing REx: "^(a(*THEN)b|ac)"

This fails in the exact same manner:

    $ perl -Mre=debug -e'print (("ac" =~ /^((?:a(*THEN)b)|ac)/)?
"yes\n":"no\n")'

This succeeds:

    $ perl -Mre=debug -e'print (("ac" =~ /^((a(*THEN)b)|ac)/)? "yes\n":"no\n")'

    Compiling REx "^((a(*THEN)b)|ac)"
    Final program:
       1: BOL (2)
       2: OPEN1 (4)
       4:   BRANCH (15)
       5:     OPEN2 (7)
       7:       EXACT <a> (9)
       9:       CUTGROUP (11)
      11:       EXACT <b> (13)
      13:     CLOSE2 (18)
      15:   BRANCH (FAIL)
      16:     EXACT <ac> (18)
      18: CLOSE1 (20)
      20: END (0)
    anchored(BOL) minlen 2
    Matching REx "^((a(*THEN)b)|ac)" against "ac"
       0 <> <ac>                 |  1:BOL(2)
       0 <> <ac>                 |  2:OPEN1(4)
       0 <> <ac>                 |  4:BRANCH(15)
       0 <> <ac>                 |  5:  OPEN2(7)
       0 <> <ac>                 |  7:  EXACT <a>(9)
       1 <a> <c>                 |  9:  CUTGROUP(11)
       1 <a> <c>                 | 11:    EXACT <b>(13)
                                          failed...
                                        failed...
       0 <> <ac>                 | 15:BRANCH(18)
       0 <> <ac>                 | 16:  EXACT <ac>(18)
       2 <ac> <>                 | 18:  CLOSE1(20)
       2 <ac> <>                 | 20:  END(0)
    Match successful!
    yes
    Freeing REx: "^((a(*THEN)b)|ac)"

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About