On 14 July 2014 12:13, Dave Mitchell <davem@iabyn.com> wrote: > On Mon, Jul 14, 2014 at 12:34:06AM +0100, Aaron Crane wrote: > > Damian Conway via RT <perlbug-followup@perl.org> wrote: > > > I have now had the opportunity to investigate the problem, and have > concluded that this has nothing to do with Regexp::Grammars per se, except > that R::G is generating the enormous regex that 5.20 is failing to compile. > > > > Yes, it looks that way to me too. Thanks for supplying that reduction. > > > > The file attached cuts down this regex further still, by removing all > > the embedded code blocks, and the various DEFINEs whose names begin > > "______0_88". > > > > The symptoms I observed seem to be the same, though I also get a > > "panic: memory wrap" error (apparently when passing 4 GiB of allocated > > memory). > > > > In blead, it looks like the immediate culprit is study_chunk() — it > > starts 185 ms after the start of the sizing pass, and I haven't yet > > had the patience to let it run long enough to finish. > > It bisects to the following. Yves...? > > commit 099ec7dcf9e085a650e6d9010c12ad9649209bf4 > Author: Yves Orton <demerphq@gmail.com> > Date: Fri Nov 22 01:08:39 2013 +0100 > > Fix RT #120600: Variable length lookbehind is not variable > > Inside of study_chunk() we have to guard against infinite > recursion with recursive subpatterns. The existing logic > sort of worked, but didn't address all cases properly. > > qr/ > (?<W>a) > (?<BB> > (?=(?&W))(?<=(?&W)) > ) > (?&BB) > /x; > > The pattern in the test would fail when the optimizer > was expanding (&BB). When it recursed, it creates a bitmap > for the recursion it performs, it then jumps back to > the BB node and then eventually does the first (&W) call. > At this point the bit for (&W) would be set in the bitmask. > When the recursion for the (&W) exited (fake exit through > the study frame logic) the bit was not /unset/. When the parser > then entered the (&W) again it was treated as a nested and > potentially infinite length pattern. > > The fake-recursion in study-chunk made it little less obvious > what was going on in the debug output. > > By reorganizing the code and adding logic to unset the bitmap > when exiting this bug was fixed. Unfortunately this also revealed > another little issue with patterns like this: > > qr/x|(?0)/ > qr/(x|(?1))/ > > which forced the creation of a new bitmask for each branch. > Effectively study_chunk treats each branch as an independent > pattern, so when we are expanding (?1) via the 'x' branch > we dont want that to prevent us from detecting the infinite recursion > in the (?1) branch. If you were to think of trips through study_chunk > as paths, and [] as recursive processing you would get something like: > > BRANCH 'x' END > BRANCH (?0) [ 'x' END ] > BRANCH (?0) [ (?0) [ 'x' END ] ] > ... > > When we want something like: > > BRANCH 'x' END > BRANCH (?0) [ 'x' END ] > BRANCH (?0) [ (?0) INFINITE_RECURSION ] > > So when we deal with a branch we need to make a new recursion bitmask. I will try to find some time for this. YvesThread Previous | Thread Next