develooper Front page | perl.perl5.porters | Postings from March 2013

Re: Is this a /^*/ bug?

Thread Previous | Thread Next
From:
demerphq
Date:
March 25, 2013 16:33
Subject:
Re: Is this a /^*/ bug?
Message ID:
CANgJU+XBg-2E6jRyPtWqEQuAcxvV8-P8hhpCOr3ZNr5=TOKQdw@mail.gmail.com
On 25 March 2013 17:22, Tom Christiansen <tchrist@perl.com> wrote:
> demerphq <demerphq@gmail.com> wrote on Mon, 25 Mar 2013 17:01:59 BST:
>
> On 25 March 2013 16:55, Tom Christiansen <tchrist@perl.com> wrote:
>
>>> I realize this is nonsense, but I wonder if it is not a bug.  Shouldn't the
>>> overall pattern still fail, not succeed?
>>>
>>>     % perl -WE 'say "foo.bar" =~ /^.*.bar$/ || "FAIL"'
>>>     1
>>>
>>>     % perl -WE 'say "foo.bar" =~ /^*.bar$/ || "FAIL"'
>>>     ^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE .bar$/ at -e line 1.
>>>     1
>>>
>>>     % perl -WE 'say "foo.bar" =~ /(^*).bar$/ || "FAIL"'
>>>     ^* matches null string many times in regex; marked by <-- HERE in m/(^* <-- HERE ).bar$/ at -e line 1.
>>>     1
>>>
>>>     % perl -WE 'say "foo.bar" =~ /^.bar$/ || "FAIL"'
>>>     FAIL
>>>
>>> Tested with v5.8.8, v5.14.0, v5.16.0, and v5.17.0-352-g3630f57.
>
>> There might be a case to change this from a warning to a fatal error.
>
>> But its a warning, and as such no, I dont think the pattern should fail:
>
>> $ perl -WE 'say "foo.bar" =~ /^*.bar$/ || "FAIL"' 2>&1 | splain
>> ^* matches null string many times in regex; marked by <-- HERE in m/^* <-- HERE
>>       .bar$/ at -e line 1 (#1)
>>    (W regexp) The pattern you've specified would be an infinite loop if the
>>    regular expression engine didn't specifically check for that.  The <-- HERE
>>    shows in the regular expression about where the problem was discovered.
>>    See perlre.
>
>> 1
>
>> Consider that ^* means "match at the start of the string 0 or more
>> times". So the pattern succeeds because it matches 0 times.
>
>> It warns because it could match at the same spot an infinite number of times.
>
>> All of this is as I expect.
>
> I read is as saying
>
>      1  Match the beginning of the string 0 or more times.
>      2  Match any one non-newline character.
>      3  Match the constant string "bar"
>      4  Match the end of the string, with optional newline slop

No arguments here.

> I think somehow the pointer tracking the position in the string to matched
> got bumped up inappropriately because of the constant-string-at-the-end
> optimization.  It fails to account for the "foo" bar, which is why I
> believe it is in error.

Well I can see why there might be confusion here. But consider, do you
think that

perl -Mre=debug -WE 'say "foo.bar" =~ /.bar$/

should fail?

Because to me the two patterns are functionally identical. ^* or ^?
are to me no-ops, so you can just remove them from the pattern when
you analyze what it does.

> But you are saying that it is ok *not* to match the beginning of the
> string, since 0 or more includes 0, and here we did not match the
> beginning of the string at all.
>
> That's pretty darned ugly, is all I can say.

Maybe the helps: (# comments added by me):

$ perl -Mre=debug -WE 'say "foo.bar" =~ /^*.bar$/ || "FAIL"'
Compiling REx "^*.bar$"
^* matches null string many times in regex; marked by <-- HERE in m/^*
<-- HERE .bar$/ at -e line 1.
Final program:
   1: CURLYX[0] {0,32767} (5)
   3:   BOL (4)
   4: WHILEM[1/1] (0)
   5: NOTHING (6)
   6: REG_ANY (7)
   7: EXACT <bar> (9)
   9: EOL (10)
  10: END (0)
floating "bar"$ at 1..1 (checking floating) minlen 4

# so this says that it must find a "bar" at the end of the string


Guessing start of match in sv for REx "^*.bar$" against "foo.bar"
Found floating substr "bar"$ at offset 4...
Starting position does not contradict /^/m...
Guessed: match at offset 3

# So it found "bar" at offset 4, and then decremented the start pos by
1 (because it needs 4 chars to match)

Matching REx "^*.bar$" against ".bar"
   3 <foo> <.bar>            |  1:CURLYX[0] {0,32767}(5)
   3 <foo> <.bar>            |  4:  WHILEM[1/1](0)
                                    whilem: matched 0 out of 0..32767
   3 <foo> <.bar>            |  3:    BOL(4)
                                      failed...
                                    whilem: failed, trying continuation...
   3 <foo> <.bar>            |  5:    NOTHING(6)
   3 <foo> <.bar>            |  6:    REG_ANY(7)
   4 <foo.> <bar>            |  7:    EXACT <bar>(9)
   7 <foo.bar> <>            |  9:    EOL(10)
   7 <foo.bar> <>            | 10:    END(0)
Match successful!
1
Freeing REx: "^*.bar$"

# And matched successfully at that position.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About