develooper Front page | perl.perl5.porters | Postings from May 2009

Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite

Thread Previous
From:
Craig A. Berry
Date:
May 31, 2009 11:42
Subject:
Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite
Message ID:
c9ab31fc0905311142l13578f98o83aa7e850ea302d5@mail.gmail.com
On Sat, May 30, 2009 at 9:39 AM, Craig A. Berry <craig.a.berry@gmail.com> wrote:
> On Thu, May 28, 2009 at 10:14 AM, Nicholas Clark
> <perlbug-followup@perl.org> wrote:
>> # New Ticket Created by  Nicholas Clark
>> # Please include the string:  [perl #66110]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=66110 >
>>
>>
>> Avar mailed p5p in 51dd1af80807190107h30b8626ct6d4d0a825abe4b3b@mail.gmail.com
>> http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-07/msg00382.html
>>
>> perl 5.10 and blead will do various combinations of running of of
>> memory, hanging or segfaulting when running on a program using
>> XML::Parser::Lite, attached is a stripped down version of X::P::L
>> which demonstrates the problem:
>>
>>
>> Dave notes:
>>
>> possibly a 5.10.0 regression
>>
>>
>
> In Perl_regexec_flags (called from Perl_pp_match), after the got_it:
> label, we end up calling savepvn repeatedly with the same large value
> for the length.  The value of i when we hit the code below gets rather
> large values, like 3,938,416 or 3,943,248 (it seems to keep growing).
>
>        if (flags & REXEC_COPY_STR) {
>            const I32 i = PL_regeol - startpos + (stringarg - strbeg);
>
> < ifdef snipped>
>            {
>                RX_MATCH_COPIED_on(rx);
>                s = savepvn(strbeg, i);
>                prog->subbeg = s;
>            }
>            prog->sublen = i;
>        }
>
> Some values of interest, including those that make up i, are as follows:
>
> REGEXEC\Perl_regexec_flags\my_perl->Ireg_state.re_state_regeol: 15380736
> REGEXEC\Perl_regexec_flags\startpos:    11437488
> REGEXEC\Perl_regexec_flags\stringarg:   11437488
> REGEXEC\Perl_regexec_flags\strbeg:      11437488
> REGEXEC\Perl_regexec_flags\strend:      11437502
>
> The string we are currently matching is:
>
> *PP_HOT\Perl_pp_match\s:        "<foo>bar</foo>"
>
> To me that looks just a tad less than 3.9 million bytes :-), but it is
> in fact the string that ends at the current value of strend.  The
> string that ends at the current value of PL_regeol (aka
> my_perl->Ireg_state.re_state_regeol) is "CODE(0xb18940)".  If I set a
> watchpoint for PL_regeol, it keeps toggling back and forth between the
> ends of these two different strings.  So it really looks as though two
> different regex operations are going on at once in interleaved fashion
> and keep hijacking the value of PL_regeol from each other.
>
> I think that's about as far as I'm going to get with this but thought
> I'd pass along my observations.

Curiosity kept me going a bit farther, so here's some more analysis
with blead.  Still nowhere near a solution.  Here is what the call
stack looks like when the value of PL_regeol gets hijacked:

DBG> sh calls
 module name    routine name     line           rel PC           abs PC
*REGEXEC        Perl_re_intuit_start
                                85298       0000000000002001 000000000049F821
*PP_HOT         Perl_pp_match
                                84290       0000000000010360 0000000000346890
*DUMP           Perl_runops_debug
                                93949       0000000000014190 00000000002B9A50
*REGEXEC        S_regmatch      88644       000000000002CA40 00000000004CA260
*REGEXEC        S_regtry        87270       000000000001CE00 00000000004BA620
*REGEXEC        Perl_regexec_flags
                                87069       0000000000019660 00000000004B6E80
*PP_HOT         Perl_pp_match
                                84303       0000000000010A60 0000000000346F90
*DUMP           Perl_runops_debug
                                93949       0000000000014190 00000000002B9A50
*PERL           S_run_body      85382       000000000000B3B0 00000000000730C0
*PERL           perl_run        85313       000000000000A930 0000000000072640

The noteworthy bit is that Perl_pp_match appears twice, so we are
doing a regex op inside of a regex op.  In S_regmatch, we are in a
section that starts with:

       case EVAL:  /*   /(?{A})B/   /(??{A})B/  and /(?(?{A})X|Y)B/   */

and does:

               CALLRUNOPS(aTHX);                       /* Scalar context. */

which in turn initiates another match operation.  So we have inner and
an outer regex matches going on at once, but they share global
variables like PL_regeol and so stomp on each other.

It may not be a general case of a problem with evals in regexen, but
the particular extreme to which evals are taken in the test case.  If
the one-line sub declaration in the test case is moved outside of an
eval as below, everything looks peachy.

--- xpl-testcase.pl;-0  2009-05-28 06:59:18 -0500
+++ xpl-testcase.pl     2009-05-31 10:29:23 -0500
@@ -61,7 +61,7 @@ sub regexp {
 sub compile { local $^W;
   # try regexp as it should be, apply patch if doesn't work
   foreach (regexp(), regexp('??')) {
-    eval qq{sub parse_re { use re "eval"; 1 while \$_[0] =~ m{$_}go
}; 1} or die;
+    sub parse_re { use re "eval"; 1 while \$_[0] =~ m{$_}go }
     last if eval { parse_re('<foo>bar</foo>'); 1 }
   };

[end]

I don't know why that sub declaration was inside an eval in the first place.

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About