develooper Front page | perl.perl5.porters | Postings from May 2009

Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite

Thread Previous | Thread Next
Craig A. Berry
May 30, 2009 07:39
Re: [perl #66110] Perl debugger runs out of memory, hangs or segfaults on XML::Parser::Lite
Message ID:
On Thu, May 28, 2009 at 10:14 AM, Nicholas Clark
<> wrote:
> # New Ticket Created by  Nicholas Clark
> # Please include the string:  [perl #66110]
> # in the subject line of all future correspondence about this issue.
> # <URL: >
> Avar mailed p5p in
> perl 5.10 and blead will do various combinations of running of of
> memory, hanging or segfaulting when running on a program using
> XML::Parser::Lite, attached is a stripped down version of X::P::L
> which demonstrates the problem:
> Dave notes:
> possibly a 5.10.0 regression

In Perl_regexec_flags (called from Perl_pp_match), after the got_it:
label, we end up calling savepvn repeatedly with the same large value
for the length.  The value of i when we hit the code below gets rather
large values, like 3,938,416 or 3,943,248 (it seems to keep growing).

        if (flags & REXEC_COPY_STR) {
            const I32 i = PL_regeol - startpos + (stringarg - strbeg);

< ifdef snipped>
                s = savepvn(strbeg, i);
                prog->subbeg = s;
            prog->sublen = i;

Some values of interest, including those that make up i, are as follows:

REGEXEC\Perl_regexec_flags\my_perl->Ireg_state.re_state_regeol: 15380736
REGEXEC\Perl_regexec_flags\startpos:    11437488
REGEXEC\Perl_regexec_flags\stringarg:   11437488
REGEXEC\Perl_regexec_flags\strbeg:      11437488
REGEXEC\Perl_regexec_flags\strend:      11437502

The string we are currently matching is:

*PP_HOT\Perl_pp_match\s:        "<foo>bar</foo>"

To me that looks just a tad less than 3.9 million bytes :-), but it is
in fact the string that ends at the current value of strend.  The
string that ends at the current value of PL_regeol (aka
my_perl->Ireg_state.re_state_regeol) is "CODE(0xb18940)".  If I set a
watchpoint for PL_regeol, it keeps toggling back and forth between the
ends of these two different strings.  So it really looks as though two
different regex operations are going on at once in interleaved fashion
and keep hijacking the value of PL_regeol from each other.

I think that's about as far as I'm going to get with this but thought
I'd pass along my observations.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About