develooper Front page | perl.perl5.porters | Postings from February 2013

[perl #116639] regex optimiser wrongly rejects certain matches involving embedded comments

Thread Next
From:
Nicholas Clark
Date:
February 5, 2013 09:26
Subject:
[perl #116639] regex optimiser wrongly rejects certain matches involving embedded comments
Message ID:
rt-3.6.HEAD-27190-1360056389-311.116639-75-0@perl.org
# New Ticket Created by  Nicholas Clark 
# Please include the string:  [perl #116639]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org:443/rt3/Ticket/Display.html?id=116639 >



This is a bug report for perl from nick@ccl4.org,
generated with the help of perlbug 1.39 running under perl 5.16.2.


-----------------------------------------------------------------
[Please describe your issue here]

It looks like the regex optimiser is wrongly concluding "can never match"
for certain patterns involving embedded comments.

Pathological patterns that put a embedded comment between an atom and its
quantifier :-)

Not a regression - has been present since embedded comments were added in
5.001:

$ ~/Sandpit/5001/bin/perl -le 'print "abbc" =~ /.b(?#Comment){2}c/ ? "Y" : "n"'
Y
$ ~/Sandpit/5001/bin/perl -le 'print "abbc" =~ /ab(?#Comment){2}c/ ? "Y" : "n"'
n


Still present in blead. The smoking gun is the debugging output
"String shorter than min possible regex match"



Compare the pattern /.b(?#Comment){2}c/ which enters the engine and matches:

$ ./perl -Dr -le 'print "abbc" =~ /.b(?#Comment){2}c/ ? "Y" : "n"'
Compiling REx ".b(?#Comment){2}c"
rarest char b at 0
Final program:
   1: REG_ANY (2)
   2: CURLY {2,2} (6)
   4:   EXACT <b> (0)
   6: EXACT <c> (8)
   8: END (0)
anchored "bbc" at 1 (checking anchored) minlen 4 
Enabling $` $& $' support (0x7).

EXECUTING...

Guessing start of match in sv for REx ".b(?#Comment){2}c" against "abbc"
Found anchored substr "bbc" at offset 1...
Guessed: match at offset 0
Matching REx ".b(?#Comment){2}c" against "abbc"
   0 <> <abbc>               |  1:REG_ANY(2)
   1 <a> <bbc>               |  2:CURLY {2,2}(6)
                                  EXACT <b> can match 2 times out of 2...
   3 <abb> <c>               |  6:  EXACT <c>(8)
   4 <abbc> <>               |  8:  END(0)
Match successful!
Y
Freeing REx: ".b(?#Comment){2}c"


with /ab(?#Comment){2}c/ which is (wrongly) rejected by the optimiser, and
hence never enters the engine:

$ ./perl -Dr -le 'print "abbc" =~ /ab(?#Comment){2}c/ ? "Y" : "n"'
Compiling REx "ab(?#Comment){2}c"
rarest char b at 1
Final program:
   1: CURLYM[0] {2,2} (7)
   3:   EXACT <ab> (5)
   5:   SUCCEED (0)
   6: NOTHING (7)
   7: EXACT <c> (9)
   9: END (0)
anchored "ababc" at 0 (checking anchored isall) minlen 5 
Enabling $` $& $' support (0x7).

EXECUTING...

String shorter than min possible regex match
n
Freeing REx: "ab(?#Comment){2}c"


Note, multiline comments, using /x, are not affected by this bug:

$ ./perl -le 'print "abbc" =~ /ab#Comment' -e '{2}c/x ? "Y" : "n"'
Y

Nicholas Clark

[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl 5.16.2:

Configured by nick at Fri Dec 21 07:57:09 GMT 2012.

Summary of my perl5 (revision 5 version 16 subversion 2) configuration:
  Commit id: db7f29e888f2e6d75a806a11ebc6caa6acd84577
  Platform:
    osname=freebsd, osvers=7.0-stable, archname=i386-freebsd
    uname='freebsd plum.flirble.org 7.0-stable freebsd 7.0-stable #7: sat jul 26 20:39:26 bst 2008 root@plum.flirble.org:usrobjusrsrcsysplum i386 '
    config_args='-des -Dprefix=/home/nick/Sandpit/5162'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
  Compiler:
    cc='cc', ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    optimize='-O',
    cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 20070719  [FreeBSD]', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lgdbm -lm -lcrypt -lutil -lc
    perllibs=-lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
    gnulibc_version=''
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC', lddlflags='-shared  -L/usr/local/lib -fstack-protector'

Locally applied patches:
    

---
@INC for perl 5.16.2:
    /home/nick/Sandpit/5162/lib/perl5/site_perl/5.16.2/i386-freebsd
    /home/nick/Sandpit/5162/lib/perl5/site_perl/5.16.2
    /home/nick/Sandpit/5162/lib/perl5/5.16.2/i386-freebsd
    /home/nick/Sandpit/5162/lib/perl5/5.16.2
    .

---
Environment for perl 5.16.2:
    HOME=/home/nick
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PATH=/home/nick/bin:/opt/local/bin:/opt/local/sbin:/usr/flirble/admin/bin:/bin:/usr/bin:/usr/local/bin:/usr/X11R6/bin:/home/nick/bin:/usr/local/sbin:/sbin:/usr/sbin
    PERL_BADLANG (unset)
    SHELL=/usr/local/bin/bash


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About