On Sat, May 22, 2010 at 01:24:10AM -0700, Terada Minoru wrote: > # New Ticket Created by Terada Minoru > # Please include the string: [perl #75258] > # in the subject line of all future correspondence about this issue. > # <URL: http://rt.perl.org/rt3/Ticket/Display.html?id=75258 > > > > > To: perlbug@perl.org > Subject: regex match fails for long input (> 32768? chars) > Reply-To: terada@ice.uec.ac.jp > Message-Id: <5.8.8_14971_1274515855@kirin.pr.ice.uec.ac.jp> > > This is a bug report for perl from terada@ice.uec.ac.jp, > generated with the help of perlbug 1.35 running under perl v5.8.8. > > > ----------------------------------------------------------------- > [Please enter your report here] > > Here is a test case: > ======== > #! /usr/bin/perl > > $n = 32768; > > $str = "{"; > for($i=0; $i<$n; $i++){ > $str = $str . "X"; > } > $str = $str . "}"; > > if($str =~ /^{((E.|[^}E])*)}/){ > print "$2\n"; > } else { > print "MATCH FAILED\n"; > } > > 1; > ======== > I tried perl 5.8.8 and 5.13.1. > > If $n <= 32767, the result is correct for both versions. > (The match succeeds.) > > For larger $n (>= 32768), the bug appears: > On 5.8.8, Segmentation fault occurs. > On 5.13.1, the match fails. > Thank you for your report. This is a known bug. Patterns of the form /(A|B)*/ are really /(A|B){0,32767}/. One might argue that not giving a segfault is an improvement, but I'm not so sure if silently giving the wrong answer is better. $ perl -Mre=debug -wce '/(A|B)*/' Compiling REx "(A|B)*" Final program: 1: CURLYM[1] {0,32767} (15) 5: TRIE-EXACT[AB] (13) <A> <B> 13: SUCCEED (0) 14: NOTHING (15) 15: END (0) minlen 0 -e syntax OK Freeing REx: "(A|B)*" $ AbigailThread Previous