On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote: > Damian Conway via RT <perlbug-followup@perl.org> wrote: > > I have now had the opportunity to investigate the problem, and have concluded that this has nothing to do with Regexp::Grammars per se, except that R::G is generating the enormous regex that 5.20 is failing to compile. > > Yes, it looks that way to me too. Thanks for supplying that reduction. > > The file attached cuts down this regex further still, by removing all > the embedded code blocks, and the various DEFINEs whose names begin > "______0_88". > > The symptoms I observed seem to be the same, though I also get a > "panic: memory wrap" error (apparently when passing 4 GiB of allocated > memory). > > In blead, it looks like the immediate culprit is study_chunk() — it > starts 185 ms after the start of the sizing pass, and I haven't yet > had the patience to let it run long enough to finish. It bisects to the following. Yves...? commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4 Author: Yves Orton <demerphq@gmail.com> Date: Fri Nov 22 01:08:39 2013 +0100 Fix RT #120600: Variable length lookbehind is not variable Inside of study_chunk() we have to guard against infinite recursion with recursive subpatterns. The existing logic sort of worked, but didn't address all cases properly. qr/ (?<W>a) (?<BB> (?=(?&W))(?<=(?&W)) ) (?&BB) /x; The pattern in the test would fail when the optimizer was expanding (&BB). When it recursed, it creates a bitmap for the recursion it performs, it then jumps back to the BB node and then eventually does the first (&W) call. At this point the bit for (&W) would be set in the bitmask. When the recursion for the (&W) exited (fake exit through the study frame logic) the bit was not /unset/. When the parser then entered the (&W) again it was treated as a nested and potentially infinite length pattern. The fake-recursion in study-chunk made it little less obvious what was going on in the debug output. By reorganizing the code and adding logic to unset the bitmap when exiting this bug was fixed. Unfortunately this also revealed another little issue with patterns like this: qr/x|(?0)/ qr/(x|(?1))/ which forced the creation of a new bitmask for each branch. Effectively study_chunk treats each branch as an independent pattern, so when we are expanding (?1) via the 'x' branch we dont want that to prevent us from detecting the infinite recursion in the (?1) branch. If you were to think of trips through study_chunk as paths, and [] as recursive processing you would get something like: BRANCH 'x' END BRANCH (?0) [ 'x' END ] BRANCH (?0) [ (?0) [ 'x' END ] ] ... When we want something like: BRANCH 'x' END BRANCH (?0) [ 'x' END ] BRANCH (?0) [ (?0) INFINITE_RECURSION ] So when we deal with a branch we need to make a new recursion bitmask. -- "Foul and greedy Dwarf - you have eaten the last candle." -- "Hordes of the Things", BBC Radio.Thread Previous | Thread Next