develooper Front page | perl.perl5.porters | Postings from February 2009

Re: regexp iteration limits

Thread Previous | Thread Next
From:
Bram
Date:
February 11, 2009 16:31
Subject:
Re: regexp iteration limits
Message ID:
20090212013109.oc2f25tan0g0wgkg@horde.wizbit.be
Citeren David Nicol <davidnicol@gmail.com>:

> Does the following qualify as a "sane" workaround?  It is still
> clearly inferior to the limit going away.
>
>
> $ perl -lwe '$a="xyzt"x10000; utf8::upgrade($a);print $a =~
> /\A(?>[a-z])*\z/ ? "ok" : "bug"'
> bug

It doesn't look like a sane regex to me...
Why use (?>[a-z])* ? Is it any different than  ([a-z])* ?
Or did you mean to write  (?>[a-z]*) ?

>
> $ perl -lwe '$a="xyzt"x10000; utf8::upgrade($a);print $a =~
> /\A(?>(?>[a-z])*)*\z/ ? "ok" : "bug"'
> ok

In a small regex/a smaller string this might be a work around but if  
the re becomes bigger and bigger then it can seriously slow the re  
down (in case of a non successful match).


Some counts:

First re:

$ perl -wle '$a="xyzt"x5000; utf8::upgrade($a);$a =~ /\A(?>[a-z])*(?{  
$count++})(?!)/;print $count;'
20001

=> 20001 different combinations were tried (lots of backtracking)

(modified first re)

$ perl -wle '$a="xyzt"x5000; utf8::upgrade($a);$a =~ /\A(?>[a-z]*)(?{  
$count++})(?!)/;print $count;'
1

=> Only 1 combination tried (good)


Second re:

$ perl -wle '$a="xyzt"x5000; utf8::upgrade($a);$a =~  
/\A(?>(?>[a-z])*)*(?{ $count++})(?!)/;print $count;'
3

While this may look sane see what happens when we increase the length  
of the string:

$ perl -wle '$a="xyzt"x5_000_000; utf8::upgrade($a);$a =~  
/\A(?>(?>[a-z])*)*(?{ $count++})(?!)/;print $count;'
613


Modified re:

$ perl -wle '$a="xyzt"x5_000_000; utf8::upgrade($a);$a =~  
/\A(?>(?>[a-z]*)*)(?{ $count++})(?!)/;print $count;'
1



Kind regards,

Bram





Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About