develooper Front page | perl.perl5.porters | Postings from August 2019

[perl #134325] Heap buffer overflow

Thread Previous | Thread Next
From:
Tony Cook via RT
Date:
August 26, 2019 06:29
Subject:
[perl #134325] Heap buffer overflow
Message ID:
rt-4.0.24-21135-1566800928-1837.134325-15-0@perl.org
On Fri, 23 Aug 2019 13:37:55 -0700, khw wrote:
> On Mon, 12 Aug 2019 07:49:58 -0700, tonyc wrote:
> >
> > After some discussion with Karl in IRC the bytes vs regnodes
> > shouldn't
> > be a problem, since RExC_size is set to zero almost immediately after
> > this, the program is reallocated in regnodes before anything else
> > much
> > happens.
> >
> > Tony
> 
> No this isn't the cause of the problems.  I had thought the comments
> explained it adequately:
> 
> /* On the first pass of the parse, we guess how big this will be.
> Then
>  * we grow in one operation to that amount and then give it back.  As
>  * we go along, we re-allocate what we need.
>  *
>  * XXX Currently the guess is essentially that the pattern will be an
>  * EXACT node with one byte input, one byte output.  This is crude,
> and
>  * better heuristics are welcome.
>  *
>  * On any subsequent passes, we guess what we actually computed in the
>  * latest earlier pass.  Such a pass probably didn't complete so is
>  * missing stuff.  We could improve those guesses by knowing where the
>  * parse stopped, and use the length so far plus apply the above
>  * assumption to what's left. */
> 
> The point of this is to allocate a bunch of memory in one gulp.  Then
> we give it back, and hope that the system isn't so busy that other
> processes gobble it up, as we allocate what we actually need as we go
> along parsing.
> 
> I used the guess of one output byte + overhead per one input byte.
> That's the only thing we know about the string at this point: its
> length.  I have changed other places in the core to do the same.  But
> better heuristics are welcome.

If I understand a minimal exact match regexp will have 3 nodes:

- the REG_MAGIC node at the front (regcomp.c:7717), but this is included in the size of the regexp_internal structure, so we don't need to count it)
- an EXACT node followed by the match text
- an END node

so a better initial allocation might be to allocate 2 nodes plus the nodes needed for the text.

Right now for a 1 to 4 length input we allocate only enough for regexp_internal with it's built-in zeroth node, and 1 byte more.

As soon as we try to emit something that will be reallocated, and if the end of program[] is on an allocation boundary that requires a new block to be allocated (as opposed to there being enough free space in the block).

All that said, the performance impact is a single realloc() per compiled regexp, so it shouldn't be a big deal.

I suspect the structure of change_engine_size() is more of a problem, since it (and its callers) mostly seem to allocate what's needed right now, so emitting 100 nodes might end up doing 100 realloc calls.

$ gdb --args ./miniperl -Dr -e '/\w\W+\w+/'
...
(gdb) b S_change_engine_size
Breakpoint 1 at 0x178bcb: file regcomp.c, line 19654.
(gdb) r
Starting program: /home/tony/dev/perl/git/perl/miniperl -Dr -e /\\w\\W+\\w+/
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Compiling REx "\w\W+\w+"

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.

Breakpoint 1, S_change_engine_size (pRExC_state=0x7fffffffdd10, size=1)
    at regcomp.c:19654
19654       PERL_ARGS_ASSERT_CHANGE_ENGINE_SIZE;
(gdb) c
Continuing.
~ tying lastbr POSIXD[\w] (1) to ender END (6) offset 5

(exactish nodes allocate 64 nodes to start with and cause less churn.)

>  On small patterns, whatever we do
> won't matter much, but on larger patterns, my guess is that much of
> that space is literal characters that need to be matched, like in DNA
> sequences.  So I think this is a reasonable guess.  We could omit this
> entirely, of course.
> 
> But by doing it, we make sure that the system
> has a sufficient amount of memory available at the beginning, and I
> think we reduce the number of mallocs that actually go out and have to
> consolidate free space.

I think allocating a bit more space wouldn't hurt, any extra space is returned if the realloc implementation supports it.

Tony

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=134325

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About