Front page | perl.perl5.porters |
Postings from May 2009
Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite
Thread Previous
From:
Craig A. Berry
Date:
May 31, 2009 11:42
Subject:
Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite
Message ID:
c9ab31fc0905311142l13578f98o83aa7e850ea302d5@mail.gmail.com
On Sat, May 30, 2009 at 9:39 AM, Craig A. Berry <craig.a.berry@gmail.com> wrote:
> On Thu, May 28, 2009 at 10:14 AM, Nicholas Clark
> <perlbug-followup@perl.org> wrote:
>> # New Ticket Created by Nicholas Clark
>> # Please include the string: [perl #66110]
>> # in the subject line of all future correspondence about this issue.
>> # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=66110 >
>>
>>
>> Avar mailed p5p in 51dd1af80807190107h30b8626ct6d4d0a825abe4b3b@mail.gmail.com
>> http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2008-07/msg00382.html
>>
>> perl 5.10 and blead will do various combinations of running of of
>> memory, hanging or segfaulting when running on a program using
>> XML::Parser::Lite, attached is a stripped down version of X::P::L
>> which demonstrates the problem:
>>
>>
>> Dave notes:
>>
>> possibly a 5.10.0 regression
>>
>>
>
> In Perl_regexec_flags (called from Perl_pp_match), after the got_it:
> label, we end up calling savepvn repeatedly with the same large value
> for the length. The value of i when we hit the code below gets rather
> large values, like 3,938,416 or 3,943,248 (it seems to keep growing).
>
> if (flags & REXEC_COPY_STR) {
> const I32 i = PL_regeol - startpos + (stringarg - strbeg);
>
> < ifdef snipped>
> {
> RX_MATCH_COPIED_on(rx);
> s = savepvn(strbeg, i);
> prog->subbeg = s;
> }
> prog->sublen = i;
> }
>
> Some values of interest, including those that make up i, are as follows:
>
> REGEXEC\Perl_regexec_flags\my_perl->Ireg_state.re_state_regeol: 15380736
> REGEXEC\Perl_regexec_flags\startpos: 11437488
> REGEXEC\Perl_regexec_flags\stringarg: 11437488
> REGEXEC\Perl_regexec_flags\strbeg: 11437488
> REGEXEC\Perl_regexec_flags\strend: 11437502
>
> The string we are currently matching is:
>
> *PP_HOT\Perl_pp_match\s: "<foo>bar</foo>"
>
> To me that looks just a tad less than 3.9 million bytes :-), but it is
> in fact the string that ends at the current value of strend. The
> string that ends at the current value of PL_regeol (aka
> my_perl->Ireg_state.re_state_regeol) is "CODE(0xb18940)". If I set a
> watchpoint for PL_regeol, it keeps toggling back and forth between the
> ends of these two different strings. So it really looks as though two
> different regex operations are going on at once in interleaved fashion
> and keep hijacking the value of PL_regeol from each other.
>
> I think that's about as far as I'm going to get with this but thought
> I'd pass along my observations.
Curiosity kept me going a bit farther, so here's some more analysis
with blead. Still nowhere near a solution. Here is what the call
stack looks like when the value of PL_regeol gets hijacked:
DBG> sh calls
module name routine name line rel PC abs PC
*REGEXEC Perl_re_intuit_start
85298 0000000000002001 000000000049F821
*PP_HOT Perl_pp_match
84290 0000000000010360 0000000000346890
*DUMP Perl_runops_debug
93949 0000000000014190 00000000002B9A50
*REGEXEC S_regmatch 88644 000000000002CA40 00000000004CA260
*REGEXEC S_regtry 87270 000000000001CE00 00000000004BA620
*REGEXEC Perl_regexec_flags
87069 0000000000019660 00000000004B6E80
*PP_HOT Perl_pp_match
84303 0000000000010A60 0000000000346F90
*DUMP Perl_runops_debug
93949 0000000000014190 00000000002B9A50
*PERL S_run_body 85382 000000000000B3B0 00000000000730C0
*PERL perl_run 85313 000000000000A930 0000000000072640
The noteworthy bit is that Perl_pp_match appears twice, so we are
doing a regex op inside of a regex op. In S_regmatch, we are in a
section that starts with:
case EVAL: /* /(?{A})B/ /(??{A})B/ and /(?(?{A})X|Y)B/ */
and does:
CALLRUNOPS(aTHX); /* Scalar context. */
which in turn initiates another match operation. So we have inner and
an outer regex matches going on at once, but they share global
variables like PL_regeol and so stomp on each other.
It may not be a general case of a problem with evals in regexen, but
the particular extreme to which evals are taken in the test case. If
the one-line sub declaration in the test case is moved outside of an
eval as below, everything looks peachy.
--- xpl-testcase.pl;-0 2009-05-28 06:59:18 -0500
+++ xpl-testcase.pl 2009-05-31 10:29:23 -0500
@@ -61,7 +61,7 @@ sub regexp {
sub compile { local $^W;
# try regexp as it should be, apply patch if doesn't work
foreach (regexp(), regexp('??')) {
- eval qq{sub parse_re { use re "eval"; 1 while \$_[0] =~ m{$_}go
}; 1} or die;
+ sub parse_re { use re "eval"; 1 while \$_[0] =~ m{$_}go }
last if eval { parse_re('<foo>bar</foo>'); 1 }
};
[end]
I don't know why that sub declaration was inside an eval in the first place.
Thread Previous