develooper Front page | perl.perl5.porters | Postings from June 2015

Re: [perl #125491] Perl regular expression bug in v5.18

Thread Previous
From:
Chas. Owens
Date:
June 28, 2015 01:50
Subject:
Re: [perl #125491] Perl regular expression bug in v5.18
Message ID:
CAH2YjjHZwKPLBvAc0z-KZjy=KQLoA0=y8-0AG_83GZqAz8k9ew@mail.gmail.com
It looks like it can be further reduced to:

perl -E 'say map "<$_>", "a/" =~ m{(b*a)?/}i'

The following all fail

perl -E 'say map "<$_>", "a%" =~ m{(\s*a)?%}i'
perl -E 'say map "<$_>", "a1" =~ m{(b*a)?1}i'
perl -E 'say map "<$_>", "a\x{2603}" =~ m{(b*a)?\x{2603}}i'

The following (as far as I can see as a user) semantically equivalent
regexes work

perl -E 'say map "<$_>", "ac" =~ m{(b*a)?c}i'
perl -E 'say map "<$_>", "a/" =~ m{(.*a)?/}i'

It is interesting to note that the regex succeeds, but matches nothing.
Also interesting, if the b* matches at least one character, then it works:

perl -E 'say map "<$_>", "ba/" =~ m{(b*a)?/}i'




On Sat, Jun 27, 2015 at 2:20 AM Michael Schout <perlbug-followup@perl.org>
wrote:

> # New Ticket Created by  Michael Schout
> # Please include the string:  [perl #125491]
> # in the subject line of all future correspondence about this issue.
> # <URL: https://rt.perl.org/Ticket/Display.html?id=125491 >
>
>
> This is a bug report for perl from mschout@gkg.net,
> generated with the help of perlbug 1.39 running under perl 5.18.4.
>
>
> -----------------------------------------------------------------
> I have found a rather innocuous regular expression that does not work in
> perl
> v5.18, but works in v5.16 and works again in v5.20.
>
> Here is the regular expression substitution:
>
>     s|^(?:\s*h)?/||i
>
> When given the string 'h//w/b', this should strip off the "h/" at the
> beginning, resulting in '/w/b'.
>
> NOTE: This is a reduced down version from the original that triggered this
> bug
> for me:
>
>     s|^(?:\s*https?:)?//[^/]+||i
>
> Under perl v5.18, this does not work:
>
> perl 5.18.4:
>
> $ perl -E '$x = "h//w/b"; $x =~ s|^(?:\s*h)?/||i; say $x'
>
>     h//w/b
>
>
> However, this works perfectly on v5.16.3, and also on v5.20.0:
>
> perl 5.16.3:
> $ perl -E '$x = "h//w/b"; $x =~ s|^(?:\s*h)?/||i; say $x'
>
>     /w/b
>
> perl 5.20.0:
>
> $ perl -E '$x = "h//w/b"; $x =~ s|^(?:\s*h)?/||i; say $x'
>
>     /w/b
>
> Curiously, the following changes to the regex yield the correct results:
>
> - remove the "\s*": s|^(?:h)?/||i   -> works correctly in 5.18
> - remove the "/i":  s|^(?:\s*h)?/|| -> works correctly in 5.18
>
> A git bisect shows that this broke in the following commit on v5.17:
>
> b25a103638bc5111446c1e65a21db00a6e13acc1 is the first bad commit
> commit b25a103638bc5111446c1e65a21db00a6e13acc1
> Author: Karl Williamson <public@khwilliamson.com>
> Date:   Thu Dec 27 12:34:41 2012 -0700
>
>     regcomp.c: Clean up ANYOF_CLASS handling.
>
>     The ANYOF_CLASS flag is used in ANYOF nodes (for [bracketed] and the
>     synthetic start class) only when matching something like \w, [:punct:]
>     etc., under /l (locale).  It should not be set unless /l is specified.
>     However, it was always getting set for the synthetic start class.  This
>     commit fixes that.  The previous code was masking errors in which it
> was
>     being tested for unnecessarily, and for much of the 5.17 series, the
>     synthetic start class was always set to test for locale, which was a
>     waste of cpu when no locale was specified.
>
> :100644 100644 eee952f376f066c6757208aae350b70dfb9fc9e3
> bf8d7e0bab1079f84d490f4cc93e6737516ef47d M      regcomp.c
> :100644 100644 8eb849cc1ec789d4c488525c587dd102298236b2
> be3970d0ed9e3fca1ab95be89354c7a8ca1aeb39 M      regcomp.h
> :040000 040000 b7cb07f7f1266276b13171049a9fe59bef27f891
> 224e7347bcc874f5265c0ac313a66e9f7489d9d3 M
>
> And reverse bisecting for when this bug was fixed in 5.19, I found that
> this bug went away with the following commit:
>
> commit cdd87c1d4df41f9a54cccff996fa64d291adcee8
> Author: Karl Williamson <public@khwilliamson.com>
> Date:   Sun Sep 22 21:36:29 2013 -0600
>
>     Teach regex optimizer to handle above-Latin1
>
>     Until this commit, the regular expression optimizer has essentially
>     punted on above-Latin1 code points.  Under some circumstances, they
>     would be taken into account, more or less, but often, the generated
>     synthetic start class would end up matching all above-Latin1 code
>     points.  With the advent of inversion lists, it becomes feasible to
>     actually fully handle such code points, as inversion lists are a
>     convenient way to express arbitrary lists of code points and take their
>     union, intersection, etc.  This commit changes the optimizer to use
>     inversion lists for operating on the code points the synthetic start
>     class can match.
>
>     I don't much understand the overall operation of the optimizer.  I'm
>     told that previous porters found that perturbing it caused unexpected
>     behaviors.  I had promised to get this change in 5.18, but didn't.  I'm
>     trying to get it in early enough into the 5.20 preliminary series that
>     any problems will surface before 5.20 ships.
>
>     This commit doesn't change the macro level logic, but does
> significantly
>     change various micro level things.  Thus the 'and' and 'or' subroutines
>     have been rewritten to use inversion lists.  I'm pretty confident that
>     they do what their names suggest.  I re-derived the equations for what
>     these operations should do, getting the same results in some cases, but
>     extending others where the previous code mostly punted.  The
> derivations
>     are given in comments in the respective routines.
>
>     Some of the code is greatly simplified, as it no longer has to treat
>     above-Latin1 specially.
>
>     It is now feasible for /i matching of above-Latin1 code points to know
>     explicitly the folds that should be in the synthetic start class.  But
>     more prepatory work needs to be done before putting that into place.
>     ...
>
> :100644 100644 ec203f9c1f3ea42c65324e632c746042c32954f1
> 3dd62f946eedd99f87489fecfeb1acd86e2d250b M      embed.fnc
> :100644 100644 fca8736feb457d00987232649f8b563aef4f24fa
> 5fc9171d233fd06b4d6cc08570cdb41f976d0b0b M      embed.h
> :100644 100644 568cdf733c7784cc3f5a059583d9ae9a262ae72a
> 33d00ef42cf01836e9365e4a9086cf64b678b74b M      proto.h
> :100644 100644 ec24583f1f7bec42e186a4241353b9688bf6f2de
> efefd0a17ba14144286dc55c149d6ee422b18249 M      regcomp.c
> :100644 100644 f0153fc12c842c26d928d3303feb519430b73a07
> eccb46690a4d252fb107044116973e3baa0cbd2c M      regcomp.h
>
> I realize fully that 5.18 is officially "End of life", but this is a rather
> severe regression and 5.18 still widespread deployment at this time, so it
> would be really nice to get this fixed in maint-5.18 somehow. Given that
> this
> is an extremely simple regular expression I suspect there may be many other
> cases where regular expressions are broken in 5.18. Perhaps the patch that
> fixed
> this in 5.19 can be backported to maint-5.18? I am not familiar at all with
> what is going on in this commit to attempt that.
>
> [Please do not change anything below this line]
> -----------------------------------------------------------------
> ---
> Flags:
>     category=core
>     severity=high
> ---
> Site configuration information for perl 5.18.4:
>
> Configured by mschout at Fri Jun 26 15:13:44 CDT 2015.
>
> Summary of my perl5 (revision 5 version 18 subversion 4) configuration:
>
>   Platform:
>     osname=darwin, osvers=14.3.0, archname=darwin-2level
>     uname='darwin lothlorien.snowcrash.lan 14.3.0 darwin kernel version
> 14.3.0: mon mar 23 11:59:05 pdt 2015; root:xnu-2782.20.48~5release_x86_64
> x86_64 '
>     config_args='-de
> -Dprefix=/Users/mschout/perl5/perlbrew/perls/perl-5.18.4 -Accflags=-fPIC
> -Aeval:scriptdir=/Users/mschout/perl5/perlbrew/perls/perl-5.18.4/bin'
>     hint=recommended, useposix=true, d_sigaction=define
>     useithreads=undef, usemultiplicity=undef
>     useperlio=define, d_sfio=undef, uselargefiles=define, usesocks=undef
>     use64bitint=define, use64bitall=define, uselongdouble=undef
>     usemymalloc=n, bincompat5005=undef
>   Compiler:
>     cc='cc', ccflags ='-fno-common -DPERL_DARWIN -fPIC
> -fno-strict-aliasing -pipe -fstack-protector -I/opt/local/include',
>     optimize='-O3',
>     cppflags='-fno-common -DPERL_DARWIN -fPIC -fno-strict-aliasing -pipe
> -fstack-protector -I/opt/local/include'
>     ccversion='', gccversion='4.2.1 Compatible Apple LLVM 6.1.0
> (clang-602.0.53)', gccosandvers=''
>     intsize=4, longsize=8, ptrsize=8, doublesize=8, byteorder=12345678
>     d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
>     ivtype='long', ivsize=8, nvtype='double', nvsize=8, Off_t='off_t',
> lseeksize=8
>     alignbytes=8, prototype=define
>   Linker and Libraries:
>     ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags ='
> -fstack-protector -L/opt/local/lib'
>     libpth=/opt/local/lib /usr/lib
>     libs=-lgdbm -ldbm -ldl -lm -lutil -lc
>     perllibs=-ldl -lm -lutil -lc
>     libc=, so=dylib, useshrplib=false, libperl=libperl.a
>     gnulibc_version=''
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
>     cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup
> -L/opt/local/lib -fstack-protector'
>
> Locally applied patches:
>
>
> ---
> @INC for perl 5.18.4:
>
> /Users/mschout/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4/darwin-2level
>     /Users/mschout/perl5/perlbrew/perls/perl-5.18.4/lib/site_perl/5.18.4
>
> /Users/mschout/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4/darwin-2level
>     /Users/mschout/perl5/perlbrew/perls/perl-5.18.4/lib/5.18.4
>     .
>
> ---
> Environment for perl 5.18.4:
>     DYLD_LIBRARY_PATH (unset)
>     HOME=/Users/mschout
>     LANG=en_US.UTF-8
>     LANGUAGE (unset)
>     LD_LIBRARY_PATH (unset)
>     LOGDIR (unset)
>
> PATH=/Users/mschout/perl5/perlbrew/bin:/Users/mschout/perl5/perlbrew/perls/perl-5.18.4/bin:/opt/local/bin:/opt/local/sbin:/opt/local/bin:/opt/local/sbin:/Users/mschout/.rvm/gems/ruby-2.1.1/bin:/Users/mschout/.rvm/gems/ruby-2.1.1@global
> /bin:/Users/mschout/.rvm/rubies/ruby-2.1.1/bin:/opt/local/bin:/opt/local/sbin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/opt/X11/bin:/Users/mschout/bin:/Users/mschout/Dropbox/bin:/opt/local/libexec/gnubin:/Users/mschout/.rvm/bin:/Users/mschout/.rvm/bin
>     PERLBREW_BASHRC_VERSION=0.69
>     PERLBREW_HOME=/Users/mschout/.perlbrew
>     PERLBREW_MANPATH=/Users/mschout/perl5/perlbrew/perls/perl-5.18.4/man
>
> PERLBREW_PATH=/Users/mschout/perl5/perlbrew/bin:/Users/mschout/perl5/perlbrew/perls/perl-5.18.4/bin
>     PERLBREW_PERL=perl-5.18.4
>     PERLBREW_ROOT=/Users/mschout/perl5/perlbrew
>     PERLBREW_VERSION=0.69
>     PERL_BADLANG (unset)
>     PERL_LOCAL_LIB_ROOT=
>     SHELL=/bin/bash
>
>

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About