develooper Front page | perl.perl5.porters | Postings from August 2019

[perl #134325] Heap buffer overflow

Thread Previous | Thread Next
From:
Karl Williamson via RT
Date:
August 23, 2019 20:38
Subject:
[perl #134325] Heap buffer overflow
Message ID:
rt-4.0.24-25997-1566592675-972.134325-15-0@perl.org
On Mon, 12 Aug 2019 07:49:58 -0700, tonyc wrote:
>
> After some discussion with Karl in IRC the bytes vs regnodes shouldn't
> be a problem, since RExC_size is set to zero almost immediately after
> this, the program is reallocated in regnodes before anything else much
> happens.
> 
> Tony

No this isn't the cause of the problems.  I had thought the comments explained it adequately:

        /* On the first pass of the parse, we guess how big this will be.  Then
         * we grow in one operation to that amount and then give it back.  As
         * we go along, we re-allocate what we need.
         *
         * XXX Currently the guess is essentially that the pattern will be an
         * EXACT node with one byte input, one byte output.  This is crude, and
         * better heuristics are welcome.
         *
         * On any subsequent passes, we guess what we actually computed in the
         * latest earlier pass.  Such a pass probably didn't complete so is
         * missing stuff.  We could improve those guesses by knowing where the
         * parse stopped, and use the length so far plus apply the above
         * assumption to what's left. */

The point of this is to allocate a bunch of memory in one gulp.  Then we give it back, and hope that the system isn't so busy that other processes gobble it up, as we allocate what we actually need as we go along parsing.

I used the guess of one output byte + overhead per one input byte.  That's the only thing we know about the string at this point: its length.  I have changed other places in the core to do the same.  But better heuristics are welcome.  On small patterns, whatever we do won't matter much, but on larger patterns, my guess is that much of that space is literal characters that need to be matched, like in DNA sequences.  So I think this is a reasonable guess.  We could omit this entirely, of course.  But by doing it, we make sure that the system has a sufficient amount of memory available at the beginning, and I think we reduce the number of mallocs that actually go out and have to consolidate free space.

-- 
Karl Williamson

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=134325

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About