On 13 July 2014 16:27, Hugo van der Sanden <perlbug-followup@perl.org> wrote: > # New Ticket Created by Hugo van der Sanden > # Please include the string: [perl #122283] > # in the subject line of all future correspondence about this issue. > # <URL: https://rt.perl.org/Ticket/Display.html?id=122283 > > > > > This is a bug report for perl from hv@crypt.org, > generated with the help of perlbug 1.40 running under perl 5.20.0. > > > ----------------------------------------------------------------- > [Please describe your issue here] > > I've been experimenting with an attempt to take a SQL grammar expressed > in BNF and convert it (programmatically) into something that can parse > SQL with it as a Regexp::Grammars (v1.035) grammar. > > The code below is (60%) cut down from an interim stage in that process; > this reaches about 10MB process size under perl-5.16.3; under perl-5.20.0 > it grows to over 1GB. Cutting down the grammar rule by rule does gradually > reduce the memory use, but it remains a high multiple of the memory use > under perl-5.16.3, and I've not yet found any smoking gun; I've included > the full 200-odd lines here rather than risk eliding something important. > > Damain and I are looking into it, but he suggested I perlbug it as a > heads-up of a possible problem in 5.20, likely of interest to davem > as potentially relating to regexp engine changes. > > zen% ulimit -v # I've set a 1GB process-size limit > 1000000 > zen% /usr/bin/time /opt/perl-5.16.3/bin/perl ./t0 # top(1) shows peak 10MB > VIRT > ok > 8.52user 0.01system 0:08.54elapsed 99%CPU (0avgtext+0avgdata > 34816maxresident)k > 0inputs+0outputs (0major+2331minor)pagefaults 0swaps > zen% /usr/bin/time /opt/perl-5.20.0/bin/perl ./t0 > Out of memory! > Command exited with non-zero status 1 > 41.59user 2.10system 0:43.83elapsed 99%CPU (0avgtext+0avgdata > 3641344maxresident)k > 0inputs+0outputs (0major+228082minor)pagefaults 0swaps > zen% cat t0 > #!/opt/perl-5.20.0/bin/perl > use strict; > use warnings; > use Regexp::Grammars; > > my $g = qr{ > ^ <query_specification> $ > > <rule: simple_Latin_letter> <simple_Latin_upper_case_letter> | > <simple_Latin_lower_case_letter> > <token: simple_Latin_upper_case_letter> A | B | C | D | E | F | G | H | I > | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z > <token: simple_Latin_lower_case_letter> a | b | c | d | e | f | g | h | i > | j | k | l | m | n | o | p | q | r | s | t | u | v | w | x | y | z > <token: digit> 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 > You really shoud use character classes here, and not use regex subs for insertable literals. IOW, (?&digit) should be replaced with $digit which would be defined as: $digit= "[0-9]" Similar for (?&ws) and similar patterns. Anyway, I have pushed the following commit which should fix this. Please test. commit a51d618a82a7057c3aabb600a7a8691d27f44a34 Author: Yves Orton <demerphq@gmail.com> Date: Fri Sep 19 19:57:34 2014 +0200 rt 122283 - do not recurse into GOSUB/GOSTART when not SCF_DO_SUBSTR See also comments in patch. A complex regex "grammar" like that in RT 122283 causes perl to take literally forever, and exhaust all memory during the pattern optimization phase. Unfortunately I could not track down exacty why this occured, but it was very clear that the excessive recursion was unnecessary and excessive. By simply eliminating the unncessary recursion performance goes back to being acceptable. I have not thought of a good way to test this change, so this patch does not include any tests. Perhaps we can test it using alarm, but I will follow up on that later. Ticket closers: please dont close the ticket until I have reported that I have applied tests for this. cheers, Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next