develooper Front page | perl.perl5.porters | Postings from July 2012

[perl #114068] Bug for multiline regex qr/( . $ .+)/xms

Thread Next
From:
Father Chrysostomos via RT
Date:
July 10, 2012 12:54
Subject:
[perl #114068] Bug for multiline regex qr/( . $ .+)/xms
Message ID:
rt-3.6.HEAD-11172-1341950069-1976.114068-15-0@perl.org
On Tue Jul 10 11:17:44 2012, pecho@belwue.de wrote:
> Hi,
> 
> this code
> 
> $ perl -E'say "aa\nbb\ncc" =~ qr/( . $ .+ )/xms ? ">$1<" : 0'
> 0
> 
> $ perl -E'say "aa\nbb\ncc\n" =~ qr/( . $ .+ )/xms ? ">$1<" : 0'
> >c
> <
> 
> shows the bug for the regex.
> 
> "aa\nbb\ncc"   =~ qr/( . $ .+ )/xms  should match a\nbb\ncc
> "aa\nbb\ncc\n" =~ qr/( . $ .+ )/xms  should match a\nbb\ncc\n
> 
> Buggy perl versions I could check : 5.12.2, 5.14.2, 5.16.0
> 
> Problem first posted at stackoverflow:
> http://stackoverflow.com/questions/11412439/cannot-understand-result-
>    for-multiline-regex-qr-xms

It appears to be the dollar sign that is not working:

$ perl5.6.2 -le 'print "aa\nbb\ncc\n" =~ qr/( [^z] $ [^z]+ )/xm ? ">$1<"
: 0'
>c
<
$ perl5.17.2 -le 'print "aa\nbb\ncc\n" =~ qr/( [^z] $ [^z]+ )/xm ?
">$1<" : 0'
>c
<

Adding a code block works around the problem, because it disables an
optimisation:

$ perl5.17.2 -le 'print "aa\nbb\ncc\n" =~ qr/(??{})( [^z] $ [^z]+ )/xm ?
">$1<" : 0'
>a
bb
cc
<

It seems the stclass optimisation is playing up here, as it jumps too
far forward in the string when trying later positions for starting the
match:

$ perl5.17.2 -Mre=debug -le 'print "aa\nbb\ncc\n" =~ qr/( [^z] $ [^z]+
)/xm ? ">$1<" : 0'
Compiling REx "( [^z] $ [^z]+ )"
Final program:
   1: OPEN1 (3)
   3:   ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY] (14)
  14:   MEOL (15)
  15:   PLUS (27)
  16:     ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY] (0)
  27: CLOSE1 (29)
  29: END (0)
anchored ""$ at 2 stclass ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY]
minlen 2 
Matching REx "( [^z] $ [^z]+ )" against "aa%nbb%ncc%n"
   0 <> <aa%nbb%ncc>         |  1:OPEN1(3)
   0 <> <aa%nbb%ncc>         | 
3:ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY](14)
   1 <a> <a%nbb%ncc>         | 14:MEOL(15)
                                  failed...
   3 <aa%n> <bb%ncc%n>       |  1:OPEN1(3)
   3 <aa%n> <bb%ncc%n>       | 
3:ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY](14)
   4 <aa%nb> <b%ncc%n>       | 14:MEOL(15)
                                  failed...
   6 <aa%nbb%n> <cc%n>       |  1:OPEN1(3)
   6 <aa%nbb%n> <cc%n>       | 
3:ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY](14)
   7 <aa%nbb%nc> <c%n>       | 14:MEOL(15)
                                  failed...
   7 <aa%nbb%nc> <c%n>       |  1:OPEN1(3)
   7 <aa%nbb%nc> <c%n>       | 
3:ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY](14)
   8 <aa%nbb%ncc> <%n>       | 14:MEOL(15)
   8 <aa%nbb%ncc> <%n>       | 15:PLUS(27)
                                 
ANYOF[\x00-y{-\xff][{unicode}0100-INFINITY] can match 1 times out of
2147483647...
   9 <aa%nbb%ncc%n> <>       | 27:  CLOSE1(29)
   9 <aa%nbb%ncc%n> <>       | 29:  END(0)
Match successful!
>c
<
Freeing REx: "( [^z] $ [^z]+ )"

-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: new
https://rt.perl.org:443/rt3/Ticket/Display.html?id=114068

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About