develooper Front page | perl.perl5.porters | Postings from April 2019

[perl #134017] Segfault in S_study_chunk (regcomp.c:4870)

From:
Hugo van der Sanden via RT
Date:
April 25, 2019 11:18
Subject:
[perl #134017] Segfault in S_study_chunk (regcomp.c:4870)
Message ID:
rt-4.0.24-3113-1556191111-619.134017-15-0@perl.org
On Thu, 11 Apr 2019 10:41:35 -0700, hv wrote:
> On Thu, 11 Apr 2019 10:05:43 -0700, public@khwilliamson.com wrote:
> > On 4/11/19 10:15 AM, Hugo van der Sanden via RT wrote:
> > > On Wed, 10 Apr 2019 16:31:13 -0700, public@khwilliamson.com wrote:
> > >> On 4/10/19 5:22 PM, Karl Williamson via RT wrote:
> > >>> Here is what the pattern looks up before the start of
> > >>> optimization
> > >>>
> > >>> 1: BRANCH (4)
> > >>> 2:   EXACT <0> (19)
> > >>> 4: BRANCH (FAIL)
> > >>> 5:   NOTHING (6)
> > >>> 6:   NOTHING (7)
> > >>> 7:   BRANCH (12)
> > >>> 8:     NOTHING (9)
> > >>> 9:     GOSUB0[+0:9] (16)
> > >>> 12:   BRANCH (FAIL)
> > >>> 13:     GOSUB0[+1:14] (16)
> > >>> 16:   TAIL (17)
> > >>> 17:   EXACT <0> (19)
> > >>> 19: END (0)
> > >>>
> > >>
> > >> This is the result of the first example in the ticket.
> > >> The segfault arises because the NEXTOPER() is getting corrupted
> > >> apparently, and is returning 0x0, and we can't access that memory
> > >> address
> > >
> > > As far as I can see, the loop:
> > >    for (cur = startbranch; cur != scan; cur = regnext(cur))
> > > is falling off the end in one of its incarnations; I don't yet
> > > understand why that's happening.
> > >
> > > I wonder whether it would be valid to change the loop condition to
> > > 'cur < scan', or maybe 'cur != scan && OP(cur) != END'.
> > >
> > > FWIW I find it marginally easier to track what's going on with the
> > > hv/study_chunk branch, even though it's a bit out of date. That
> > > gives
> > > this stack trace:
> > > #0  0x00000000004eba16 in S_rck_make_trie
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd270) at regcomp.c:5882
> > > #1  0x00000000004e7123 in S_rck_branch (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd270) at regcomp.c:4431
> > > #2  0x00000000004ee4a2 in S_study_chunk_one_node
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd270) at regcomp.c:6420
> > > #3  0x00000000004e6893 in S_study_chunk
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd270) at regcomp.c:4280
> > > #4  0x00000000004eaf43 in S_rck_make_trie
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd920) at regcomp.c:5661
> > > #5  0x00000000004e7123 in S_rck_branch (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd920) at regcomp.c:4431
> > > #6  0x00000000004ee4a2 in S_study_chunk_one_node
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd920) at regcomp.c:6420
> > > #7  0x00000000004e6893 in S_study_chunk
> > > (pRExC_state=0x7fffffffdab0,
> > >      params=0x7fffffffd920) at regcomp.c:4280
> > > #8  0x00000000004f7c6f in Perl_re_op_compile (patternp=0x0,
> > > pat_count=1,
> > >      expr=0xbbe8a8, eng=0x7a00a0 <PL_core_reg_engine>, old_re=0x0,
> > >      is_bare_re=0x0, orig_rx_flags=0, pm_flags=0) at regcomp.c:8158
> > > #9  0x000000000041c753 in Perl_pmruntime (o=0xbbe8e8,
> > > expr=0xbbe8a8,
> > > repl=0x0,
> > >      flags=1, floor=0) at op.c:7030
> > > #10 0x00000000004c9a65 in Perl_yyparse (gramtype=258) at
> > > perly.y:1228
> > > #11 0x000000000044bc52 in S_parse_body (env=0x0, xsinit=0x74f799
> > > <xs_init>)
> > >      at perl.c:2503
> > > #12 0x000000000044a70a in perl_parse (my_perl=0xb95010,
> > >      xsinit=0x74f799 <xs_init>, argc=3, argv=0x7fffffffe508,
> > > env=0x0)
> > >      at perl.c:1797
> > > #13 0x000000000074f700 in main (argc=3, argv=0x7fffffffe508,
> > >      env=0x7fffffffe528) at miniperlmain.c:127
> > >
> > > .. but deepest stack has had at least two further levels of
> > > study_chunk -> rck_make_trie frames before coming back to hit this
> > > SEGV.
> > >
> > > Unsurprisingly this does not fail if perl is built without
> > > PERL_ENABLE_TRIE_OPTIMISATION.
> > >
> > > Changing the condition to 'cur < scan' is enough to make both the
> > > two
> > > testcases here survive, and passes tests; I don't know if there's
> > > some reason it might be turning off a bunch of useful trie
> > > optimizations though.
> > >
> >
> > I would think that using cur != scan && OP(cur) != END'
> > as you also suggested would not turn off any real optimizations.
> 
> Hmm, the second case still fails with that variant:
> % ./miniperl -e '0 =~ /0(?n)|()(()((?0)|(?0)))0*\N0/'
> Floating point exception (core dumped)
>  %
> 
> I'll try digging into that when I have some more time - Yves, if you
> have any chance to look at this I'm sure that would help a lot.

With -Dr that second case hits an assertion with all three variants of the loop:
% ./miniperl -Dr -e '0 =~ /0(?n)|()(()((?0)|(?0)))0*\N0/'
Compiling REx "0(?n)|()(()((?0)|(?0)))0*\N0"
~ tying lastbr BRANCH (11) to ender TAIL (15) offset 4
~ tying lastbr BRANCH (4) to ender END (22) offset 18
Freeing REx: "0(?n)|()(()((?0)|(?0)))0*\N0"
~ tying lastbr BRANCH (11) to ender TAIL (15) offset 4
~ tying lastbr BRANCH (4) to ender END (22) offset 18
miniperl: regcomp.c:8324: Perl_re_op_compile: Assertion `scan && ((scan)->type) == 85' failed.
Aborted (core dumped)
% 

Disturbingly, if I fix the assertion as advised with the patch below, that case survives with -Dr but still gives a floating point exception without -Dr. It turns out that _with_ -Dr we're hitting the startbranch..scan loop twice; _without_ -Dr we're hitting it a third time, and by the looks of it that third time the scan we're aiming at is in an optimized-out section:
% ./miniperl -e '0 =~ /0(?n)|()(()((?0)|(?0)))0*\N0/'
(first loop, scan=db7d84)
cur db7d64 (BRANCH)
cur db7d74 (BRANCH)
(second loop, scan=db7da0)
cur db7d4c (BRANCH)
cur db7d58 (BRANCH)
(third loop, scan=db7d84)
cur db7d64 (BRANCH)
cur db7d74 (BRANCH)
cur db7d88 (STAR) # oops
cur db7d94 (REG_ANY)
cur db7d98 (EXACT)
Floating point exception (core dumped)

The difference appears to be because in at least some cases we only mark optimized-out nodes as OPTIMIZED if -Dr is enabled: the point of divergence is the test
                 if (PERL_ENABLE_TRIE_OPTIMISATION &&
                        OP( startbranch ) == BRANCH )
.. which with -Dr is false (the op is OPTIMIZED) on the third time of asking.

Hugo

diff --git a/regcomp.c b/regcomp.c
index fbd5c18..ba2b2b5 100644
--- a/regcomp.c
+++ b/regcomp.c
@@ -8316,12 +8316,11 @@ Perl_re_op_compile(pTHX_ SV ** const patternp, int pat_count,
          * the OPEN/CURLYM (CURLYM's are special and can act like OPEN's)
          * it refers to.
          *
-         * If for some reason someone writes code that optimises
-         * away a GOSUB opcode then the assert should be changed to
-         * an if(scan) to guard the ARG2L_SET() - Yves
-         *
          */
-        assert(scan && OP(scan) == GOSUB);
+        assert(scan);
+        if (OP(scan) == OPTIMIZED)
+            continue;
+        assert(OP(scan) == GOSUB);
         ARG2L_SET( scan, RExC_open_parens[ARG(scan)] - REGNODE_OFFSET(scan));
     }
 


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=134017



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About