develooper Front page | perl.perl5.porters | Postings from July 2018

[perl #133352] Ancient Regex Regression

Thread Previous | Thread Next
From:
" Deven T . Corzine "
Date:
July 13, 2018 03:30
Subject:
[perl #133352] Ancient Regex Regression
Message ID:
rt-4.0.24-12372-1531109072-11.133352-75-0@perl.org
# New Ticket Created by  "Deven T. Corzine" 
# Please include the string:  [perl #133352]
# in the subject line of all future correspondence about this issue. 
# <URL: https://rt.perl.org/Ticket/Display.html?id=133352 >


This is a bug report for perl from deven@ties.org,
generated with the help of perlbug 1.41 running under perl 5.29.1.


-----------------------------------------------------------------
[Please describe your issue here]

I discovered a bug in Perl's regular expression engine a few
months ago.  I showed it to many people at The Perl Conference
in Salt Lake City a couple weeks ago, and everyone agreed that
this was a bug in the regex engine in Perl itself, including
Abigail, Tom Christiansen, Karl Williamson and Larry Wall.

I even ended up doing a lightning talk about the bug:

     https://www.youtube.com/watch?v=U-JhPIECkPY

This was my test case, which works with or without anchors:

     "afoobar" =~ /((.)foo|bar)*/
     "afoobar" =~ /^((.)foo|bar)*$/

Or, as a standalone command:

     perl -e 'print "$2\n" if "afoobar" =~ /^((.)foo|bar)*$/;'

This prints "b", even though "bfoo" never appears in "afoobar"!

I understand why this is happening -- the inner group does match
against "b" in "bar" on the second iteration, but this branch of
the alternation fails.  The capture is still being used, despite
the fact that it came from a failed branch of the alternation.

The correct answer seems to be "a", since that's the last match
of the inner group and the overall match is successful.  Perl 1.0
can't handle this regex (Larry said it was the regex engine from
Gosling Emacs), but Perl 2.0 through Perl 5.0 alpha 9 all print
"a" for the command above.  Other regex implementations, such as
PCRE, RE2, GNU and others, also return "a" for the inner group.

Perl 5.000 (from 1994) is the first commit in the git repository
(commit a0d0e21ea6ea90a22318550944fe6cb09ae10cda) which exhibits
the bug, printing "b" instead of "a".  I just built blead again
today and confirmed that the bug is still there, despite passing
the full test suite.  (Tom Christiansen pointed out that this bug
is technically a regression, since it used to work correctly.)

Even though I have never worked on the Perl core, and I've been
warned that the regex engine is particularly difficult, I would
still like to attempt to develop a patch for this bug myself.

I've already managed to create a working patch that fixes this
bug without breaking any of the regular expression tests in the
test suite, so I think I'm on the right track, but I think there
may be a few edge cases to consider, so I'm not ready to submit
the patch just yet.

Yves, SawyerX thought you might be willing to mentor me on this?
If so, that would be great!

My solution involves saving the captures with regcppush() on
BRANCH and TRIE nodes and restoring them with regcp_restore() at
BRANCH_next_fail and TRIE_next_fail.  Does that sound like the
right approach, give or take?


[Please do not change anything below this line]
-----------------------------------------------------------------
---
Flags:
    category=core
    severity=low
---
Site configuration information for perl 5.29.1:

Configured by deven at Sat Jul  7 22:07:54 EDT 2018.

Summary of my perl5 (revision 5 version 29 subversion 1) configuration:
  Commit id: 71525f77826ad33944c007b06b68a1f14a085e7a
  Platform:
    osname=linux
    osvers=3.19.8-100.fc20.x86_64
    archname=x86_64-linux-thread-multi
    uname='linux twist.ties.org 3.19.8-100.fc20.x86_64 #1 smp tue may 12
17:08:50 utc 2015 x86_64 x86_64 x86_64 gnulinux '
    config_args=''
    hint=previous
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='gcc'
    ccflags ='-DDEBUGGING -D_REENTRANT -D_GNU_SOURCE -fwrapv
-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include
-D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-g'
    cppflags='-DDEBUGGING -D_REENTRANT -D_GNU_SOURCE -fwrapv
-fno-strict-aliasing -pipe -fstack-protector-strong -I/usr/local/include'
    ccversion=''
    gccversion='4.8.3 20140911 (Red Hat 4.8.3-7)'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64 /lib
/lib64 /usr/lib64 /usr/local/lib64 /usr/local/lib /usr/lib
    libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
-lgdbm_compat
    perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
    libc=libc-2.18.so
    so=so
    useshrplib=false
    libperl=libperl.a
    gnulibc_version='2.18'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -g -L/usr/local/lib -fstack-protector-strong'


---
@INC for perl 5.29.1:
    lib
    /usr/local/lib/perl5/site_perl/5.29.1/x86_64-linux-thread-multi
    /usr/local/lib/perl5/site_perl/5.29.1
    /usr/local/lib/perl5/5.29.1/x86_64-linux-thread-multi
    /usr/local/lib/perl5/5.29.1

---
Environment for perl 5.29.1:
    HOME=/home/deven
    LANG=en_US.UTF-8
    LANGUAGE (unset)
    LD_LIBRARY_PATH (unset)
    LOGDIR (unset)

PATH=/home/deven/bin:/home/deven/scripts:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/etc:/sbin:/home/deven/bin
    PERL_BADLANG (unset)
    SHELL=/bin/bash


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About