develooper Front page | perl.perl5.porters | Postings from February 2013

[perl #116639] regex optimiser wrongly rejects certain matches involving embedded comments

Thread Next
Nicholas Clark
February 5, 2013 09:26
[perl #116639] regex optimiser wrongly rejects certain matches involving embedded comments
Message ID:
# New Ticket Created by  Nicholas Clark 
# Please include the string:  [perl #116639]
# in the subject line of all future correspondence about this issue. 
# <URL: >

This is a bug report for perl from,
generated with the help of perlbug 1.39 running under perl 5.16.2.

[Please describe your issue here]

It looks like the regex optimiser is wrongly concluding "can never match"
for certain patterns involving embedded comments.

Pathological patterns that put a embedded comment between an atom and its
quantifier :-)

Not a regression - has been present since embedded comments were added in

$ ~/Sandpit/5001/bin/perl -le 'print "abbc" =~ /.b(?#Comment){2}c/ ? "Y" : "n"'
$ ~/Sandpit/5001/bin/perl -le 'print "abbc" =~ /ab(?#Comment){2}c/ ? "Y" : "n"'

Still present in blead. The smoking gun is the debugging output
"String shorter than min possible regex match"

Compare the pattern /.b(?#Comment){2}c/ which enters the engine and matches:

$ ./perl -Dr -le 'print "abbc" =~ /.b(?#Comment){2}c/ ? "Y" : "n"'
Compiling REx ".b(?#Comment){2}c"
rarest char b at 0
Final program:
   1: REG_ANY (2)
   2: CURLY {2,2} (6)
   4:   EXACT <b> (0)
   6: EXACT <c> (8)
   8: END (0)
anchored "bbc" at 1 (checking anchored) minlen 4 
Enabling $` $& $' support (0x7).


Guessing start of match in sv for REx ".b(?#Comment){2}c" against "abbc"
Found anchored substr "bbc" at offset 1...
Guessed: match at offset 0
Matching REx ".b(?#Comment){2}c" against "abbc"
   0 <> <abbc>               |  1:REG_ANY(2)
   1 <a> <bbc>               |  2:CURLY {2,2}(6)
                                  EXACT <b> can match 2 times out of 2...
   3 <abb> <c>               |  6:  EXACT <c>(8)
   4 <abbc> <>               |  8:  END(0)
Match successful!
Freeing REx: ".b(?#Comment){2}c"

with /ab(?#Comment){2}c/ which is (wrongly) rejected by the optimiser, and
hence never enters the engine:

$ ./perl -Dr -le 'print "abbc" =~ /ab(?#Comment){2}c/ ? "Y" : "n"'
Compiling REx "ab(?#Comment){2}c"
rarest char b at 1
Final program:
   1: CURLYM[0] {2,2} (7)
   3:   EXACT <ab> (5)
   5:   SUCCEED (0)
   6: NOTHING (7)
   7: EXACT <c> (9)
   9: END (0)
anchored "ababc" at 0 (checking anchored isall) minlen 5 
Enabling $` $& $' support (0x7).


String shorter than min possible regex match
Freeing REx: "ab(?#Comment){2}c"

Note, multiline comments, using /x, are not affected by this bug:

$ ./perl -le 'print "abbc" =~ /ab#Comment' -e '{2}c/x ? "Y" : "n"'

Nicholas Clark

[Please do not change anything below this line]
Site configuration information for perl 5.16.2:

Configured by nick at Fri Dec 21 07:57:09 GMT 2012.

Summary of my perl5 (revision 5 version 16 subversion 2) configuration:
  Commit id: db7f29e888f2e6d75a806a11ebc6caa6acd84577
    osname=freebsd, osvers=7.0-stable, archname=i386-freebsd
    uname='freebsd 7.0-stable freebsd 7.0-stable #7: sat jul 26 20:39:26 bst 2008 i386 '
    config_args='-des -Dprefix=/home/nick/Sandpit/5162'
    hint=recommended, useposix=true, d_sigaction=define
    useithreads=undef, usemultiplicity=undef
    useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
    use64bitint=undef, use64bitall=undef, uselongdouble=undef
    usemymalloc=n, bincompat5005=undef
    cc='cc', ccflags ='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include',
    cppflags='-DHAS_FPSETMASK -DHAS_FLOATINGPOINT_H -fno-strict-aliasing -pipe -fstack-protector -I/usr/local/include'
    ccversion='', gccversion='4.2.1 20070719  [FreeBSD]', gccosandvers=''
    intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=1234
    d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=12
    ivtype='long', ivsize=4, nvtype='double', nvsize=8, Off_t='off_t', lseeksize=8
    alignbytes=4, prototype=define
  Linker and Libraries:
    ld='cc', ldflags ='-Wl,-E  -fstack-protector -L/usr/local/lib'
    libpth=/usr/lib /usr/local/lib
    libs=-lgdbm -lm -lcrypt -lutil -lc
    perllibs=-lm -lcrypt -lutil -lc
    libc=, so=so, useshrplib=false, libperl=libperl.a
  Dynamic Linking:
    dlsrc=dl_dlopen.xs, dlext=so, d_dlsymun=undef, ccdlflags=' '
    cccdlflags='-DPIC -fPIC', lddlflags='-shared  -L/usr/local/lib -fstack-protector'

Locally applied patches:

@INC for perl 5.16.2:

Environment for perl 5.16.2:
    LANG (unset)
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)
    PERL_BADLANG (unset)

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About