develooper Front page | perl.perl5.porters | Postings from July 2014

Re: [perl #122283] Possible regexp memory explosion in 5.20.0

Thread Previous | Thread Next
From:
Dave Mitchell
Date:
July 14, 2014 10:14
Subject:
Re: [perl #122283] Possible regexp memory explosion in 5.20.0
Message ID:
20140714101358.GS15971@iabyn.com
On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote:
> Damian Conway via RT <perlbug-followup@perl.org> wrote:
> > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile.
> 
> Yes, it looks that way to me too. Thanks for supplying that reduction.
> 
> The file attached cuts down this regex further still, by removing all
> the embedded code blocks, and the various DEFINEs whose names begin
> "______0_88".
> 
> The symptoms I observed seem to be the same, though I also get a
> "panic: memory wrap" error (apparently when passing 4 GiB of allocated
> memory).
> 
> In blead, it looks like the immediate culprit is study_chunk() — it
> starts 185 ms after the start of the sizing pass, and I haven't yet
> had the patience to let it run long enough to finish.

It bisects to the following. Yves...?

commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4
Author: Yves Orton <demerphq@gmail.com>
Date:   Fri Nov 22 01:08:39 2013 +0100

    Fix RT #120600: Variable length lookbehind is not variable
    
    Inside of study_chunk() we have to guard against infinite
    recursion with recursive subpatterns. The existing logic
    sort of worked, but didn't address all cases properly.
    
      qr/
        (?<W>a)
        (?<BB>
          (?=(?&W))(?<=(?&W))
        )
        (?&BB)
      /x;
    
    The pattern in the test would fail when the optimizer
    was expanding (&BB). When it recursed, it creates a bitmap
    for the recursion it performs, it then jumps back to
    the BB node and then eventually does the first (&W) call.
    At this point the bit for (&W) would be set in the bitmask.
    When the recursion for the (&W) exited (fake exit through
    the study frame logic) the bit was not /unset/. When the parser
    then entered the (&W) again it was treated as a nested and
    potentially infinite length pattern.
    
    The fake-recursion in study-chunk made it little less obvious
    what was going on in the debug output.
    
    By reorganizing the code and adding logic to unset the bitmap
    when exiting this bug was fixed. Unfortunately this also revealed
    another little issue with patterns like this:
    
      qr/x|(?0)/
      qr/(x|(?1))/
    
    which forced the creation of a new bitmask for each branch.
    Effectively study_chunk treats each branch as an independent
    pattern, so when we are expanding (?1) via the 'x' branch
    we dont want that to prevent us from detecting the infinite recursion
    in the (?1) branch. If you were to think of trips through study_chunk
    as paths, and [] as recursive processing you would get something like:
    
      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) [ 'x' END ] ]
      ...
    
    When we want something like:
    
      BRANCH 'x' END
      BRANCH (?0) [ 'x' END ]
      BRANCH (?0) [ (?0) INFINITE_RECURSION ]
    
    So when we deal with a branch we need to make a new recursion bitmask.



-- 
"Foul and greedy Dwarf - you have eaten the last candle."
    -- "Hordes of the Things", BBC Radio.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About