develooper Front page | perl.perl5.porters | Postings from October 2016

Re: [perl #129903] regexec.c stack overflow

Thread Previous | Thread Next
From:
Dan Collins
Date:
October 18, 2016 02:06
Subject:
Re: [perl #129903] regexec.c stack overflow
Message ID:
CA+tt54J2nMVA5vvvpXGx=VC5A_t4Oz9mwsKehw6m2B3ZNRyWLw@mail.gmail.com
$ perl5.14.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.18.0 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.16.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.17.0 -e '/(?{m}(0)},s\/\/\/})//0'
Sequence (?{...}) not terminated or not {}-balanced in regex; marked by <--
HERE in m/(?{ <-- HERE m}(0)},s///})/ at -e line 1.
$ perl5.17.5 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.17.3 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl5.17.1 -e '/(?{m}(0)},s\/\/\/})//0'
Segmentation fault
$ perl Porting/bisect.pl --start=v5.17.0 --end=v5.17.1 --crash -- ./perl
-Ilib -e '/(?{m}(0)},s\/\/\/})//0'
68e2671bec1b01022978d5d5eb6eee8742396e13 is the first bad commit
commit 68e2671bec1b01022978d5d5eb6eee8742396e13
Author: David Mitchell <davem@iabyn.com>
Date:   Thu Aug 25 11:41:49 2011 +0100

    Mostly complete fix for literal /(?{..})/ blocks

    Change the way that code blocks in patterns are parsed and executed,
    especially as regards lexical and scoping behaviour.

    (Note that this fix only applies to literal code blocks appearing within
    patterns: run-time patterns, and literals within qr//, are still done
the
    old broken way for now).

    This change means that for literal /(?{..})/ and /(??{..})/:

    * the code block is now fully parsed in the same pass as the surrounding
      code, which means that the compiler no longer just does a simplistic
      count of balancing {} to find the limits of the code block;
      i.e. stuff like /(?{  $x = "{" })/ now works (in the same way
      that subscripts in double quoted strings always have: "$a{'{'}" )

    * Error and warning messages will now appear to emanate from the main
body
      rather than an re_eval; e.g. the output from

        #!/usr/bin/perl
        /(?{ warn "boo" })/

    has changed from

        boo at (re_eval 1) line 1.

    to

        boo at /tmp/p line 2.

    * scope and closures now behave as you might expect; for example

            for my $x (qw(a b c)) { "" =~ /(?{ print $x })/ }

      now prints "abc" rather than ""

    * with recursion, it now finds the lexical within the appropriate depth
      of pad: this code now prints "012" rather than "000":

        sub recurse {
            my ($n) = @_;
            return if $n > 2;
            "" =~ /^(?{print $n})/;
            recurse($n+1);
        }
        recurse(0);

    * an earlier fix that stopped 'my' declarations within code blocks
causing
      crashes, required the accumulating of two SAVECOMPPADs on the stack
for
      each iteration of the code block; this is no longer needed;

    * UNITCHECK blocks within literal code blocks are now run as part of the
      main body of code (run-time code blocks still trigger an immediate
      call to the UNITCHECK block though)

    This is all achieved by building upon the efforts of the commits which
led
    up to this; those altered the parser to parse literal code blocks
    directly, but up until now those code blocks were discarded by
    Perl_pmruntime and the block re-compiled using the original re_eval
    mechanism. As of this commit, for the non-qr and non-runtime variants,
    those code blocks are no longer thrown away. Instead:

    * the LISTOP generated by the parser, which contains all the code
      blocks plus OP_CONSTs that collectively make up the literal pattern,
      is now stored in a new field in PMOPs, called op_code_list. For
example
      in /A(?{BLOCK})C/, the listop stored in op_code_list looks like

        LIST
            PUSHMARK
            CONST['A']
            NULL/special (aka a DO block)
                BLOCK
            CONST['(?{BLOCK})']
            CONST['B']

    * each of the code blocks has its last op set to null and is
individually
      run through the peephole optimiser, so each one becomes a little
      self-contained block of code, rather than a list of blocks that run
into
      each other;

    * then in re_op_compile(), we concatenate the list of CONSTs to produce
a
      string to be compiled, but at the same time we note any DO blocks and
      note the start and end positions of the corresponding
CONST['(?{BLOCK})'];

    * (if the current regex engine isn't the built-in perl one, then we just
      throw away the code blocks and pass the concatenated string to the
engine)

    * then during regex compilation, whenever we encounter a '(?{', we see
if
      it matches the index of one of the pre-compiled blocks, and if so, we
      store a pointer to that block in an 'l' data slot, and use the end
index
      to skip over the text of the code body. Conversely, if the index
doesn't
      match, then we know that it's a run-time pattern and (for now),
compile
      it in the old way.

    * During execution, when an EVAL op is encountered, if data->what is
'l',
      then we just use the pad that was in effect when the pattern was
called;
      i.e. we use the current pad slot of the currently executing CV that
the
      pattern is embedded within.

:100644 100644 75ee7e7bae9366e089980b8a7d346198645cdfe9
8f2fa766c6cc393d413d59b55ad6f896ec5c7010 M      dump.c
:100644 100644 5cc98874ae03eae36d4207071ed587d51e80ebed
4d82c7cc5797c4a97f2251f4f1487aa090efda99 M      op.c
:100644 100644 6aa16f5725db2ae0a6a01b0d5b34b6446e8b5ebf
f267da2c9c752b765151eca4c7009230c97b82ed M      op.h
:040000 040000 3454e32e87872db5fef604d816c7ca37dc487166
f7b4543ded1cbaf299521b31150d1c38e5b11509 M      pod
:100644 100644 394502b314a65eea36bca2f52e31f50048f0060d
b45122c1a79df44e6094c1001fc7b3456543b740 M      regcomp.c
:100644 100644 0fdb0058b8b5fce8763f7d0da05b187d460838b6
a9da0c97e12a8038ef0bc49870737fe62a7c01e6 M      regcomp.h
:100644 100644 bb845a79216c447b2fa7ec08cee46602b05d3461
f384c4d32c11fe1917e7e8fbcd998b60dc1355c3 M      regexec.c
:040000 040000 a2c61b299bd88c2308af219684388ae3deea371d
4ae03f25b1ebcee1518e0dc6ccc7ae23b1433f61 M      t
bisect run success
That took 772 seconds.

--
Dan Collins

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About