develooper Front page | perl.perl5.porters | Postings from July 2018

Re: [perl #133366] *COMMIT bug?

Thread Previous | Thread Next
From:
Abigail
Date:
July 12, 2018 18:01
Subject:
Re: [perl #133366] *COMMIT bug?
Message ID:
20180712180812.GA31649@almanda.fritz.box
On Thu, Jul 12, 2018 at 10:02:55AM -0700, Philip Hazel (via RT) wrote:
> # New Ticket Created by  Philip Hazel 
> # Please include the string:  [perl #133366]
> # in the subject line of all future correspondence about this issue. 
> # <URL: https://rt.perl.org/Ticket/Display.html?id=133366 >
> 
> 
> Subject: Odd *COMMIT behaviour
> To: perlbug@perl.org
> Message-Id: <5.26.2_28082_1531414535@quercite>
> Reply-To: ph10@cam.ac.uk
> Cc: builduser
> From: ph10@cam.ac.uk
> 
> 
> This is a bug report for perl from ph10@cam.ac.uk,
> generated with the help of perlbug 1.40 running under perl 5.26.2.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> Consider this example:
> 
> perl -e 'if ("ABCABD" =~ /(A(*COMMIT)|B)(A|B)D/) { print "yes $&\n"; } else { print "no\n"; }
> 
> On my Linux box the output is "yes ABD". But surely it should not match? After 
> matching the initial "A", shouldn't the (*COMMIT) stop any "bumaplong" 
> happening? If the (*COMMIT) is moved to before the second "A" in the pattern, 
> it does indeed print "no".
> 


I think you are right.

What seems to happen is that the optimizer is too eager. It finds the 'D'
in both the pattern and the string, and determines that if the pattern
matches, it should start matching 2 characters before the 'D'. This is
a valid strategy if the (*COMMIT) is not present, but not in the presence
of it.


    use re 'debug';
    "ABCABD" =~ /(A(*COMMIT)|B)(A|B)D/;
    __END__

    Compiling REx "(A(*COMMIT)|B)(A|B)D"
    Final program:
       1: OPEN1 (3)
       3:   TRIE-EXACT[AB] (11)
            <A> (6)
       6:     COMMIT (11)
            <B> (11)
      11: CLOSE1 (13)
      13: OPEN2 (15)
      15:   TRIE-EXACT[AB] (21)
            <A> 
            <B> 
      21: CLOSE2 (23)
      23: EXACT <D> (25)
      25: END (0)
    anchored "D" at 2 (checking anchored) stclass AHOCORASICK-EXACT[AB] minlen 3 
    Matching REx "(A(*COMMIT)|B)(A|B)D" against "ABCABD"
    Intuit: trying to determine minimum start position...
      doing 'check' fbm scan, [2..6] gave 5
      Found anchored substr "D" at offset 5 (rx_origin now 3)...
      (multiline anchor test skipped)
      try at offset...
    Intuit: Successfully guessed: match at offset 3
       3 <ABC> <ABD>             |   0| 1:OPEN1(3)
       3 <ABC> <ABD>             |   0| 3:TRIE-EXACT[AB](11)
       3 <ABC> <ABD>             |   0| State:    1 Accepted: N Charid:  1 CP:  41 After State:    2
       4 <ABCA> <BD>             |   0| State:    2 Accepted: Y Charid:  0 CP:   0 After State:    0
                                 |   0| got 1 possible matches
                                 |   0| TRIE matched word #1, continuing
       4 <ABCA> <BD>             |   1|  6:COMMIT(11)
       4 <ABCA> <BD>             |   2|   11:CLOSE1(13)
       4 <ABCA> <BD>             |   2|   13:OPEN2(15)
       4 <ABCA> <BD>             |   2|   15:TRIE-EXACT[AB](21)
       4 <ABCA> <BD>             |   2|   State:    1 Accepted: N Charid:  2 CP:  42 After State:    3
       5 <ABCAB> <D>             |   2|   State:    3 Accepted: Y Charid:  0 CP:   0 After State:    0
                                 |   2|   got 1 possible matches
                                 |   2|   TRIE matched word #2, continuing
                                 |   2|   only one match left, short-circuiting: #2 <B>
       5 <ABCAB> <D>             |   2|   21:CLOSE2(23)
       5 <ABCAB> <D>             |   2|   23:EXACT <D>(25)
       6 <ABCABD> <>             |   2|   25:END(0)
    Match successful!
    Freeing REx: "(A(*COMMIT)|B)(A|B)D"
    


Abigail

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About