develooper Front page | perl.perlfaq.workers | Postings from January 2005

perlfaq6: How do I efficiently match many regular expressions at once?

From:
_brian_d_foy
Date:
January 26, 2005 10:52
Subject:
perlfaq6: How do I efficiently match many regular expressions at once?
Message ID:
260120051252012089%comdog@panix.com

* updated the answer so that we don't think perl5.005 is the
newest version of perl

* cleaned up the examples to give them parallel structure and
to generalize the method.  (What was popstates anyway?)

* added another example

* mention Mastering Regular Expressions to motivate people to
find out how regexes work and that they can be tuned.




diff -u -d -r1.28 perlfaq6.pod
--- perlfaq6.pod        3 Jan 2005 18:43:37 -0000       1.28
+++ perlfaq6.pod        26 Jan 2005 18:48:13 -0000
@@ -518,32 +518,58 @@
 
 =head2 How do I efficiently match many regular expressions at once?
 
-The following is extremely inefficient:
+( contributed by brian d foy )
 
-    # slow but obvious way
-    @popstates = qw(CO ON MI WI MN);
-    while (defined($line = <>)) {
-       for $state (@popstates) {
-           if ($line =~ /\b$state\b/i) {
-               print $line;
-               last;
-           }
-       }
-    }
+Avoid asking Perl to compile a regular expression every time 
+you want to match it.  In this example, perl must recompile
+the regular expression for every iteration of the foreach()
+loop since it has no way to know what $state will be.
 
-That's because Perl has to recompile all those patterns for each of
-the lines of the file.  As of the 5.005 release, there's a much better
-approach, one which makes use of the new C<qr//> operator:
+    @patterns = qw( foo bar baz );
+    
+    LINE: while( <> ) 
+       {
+               foreach $pattern ( @patterns ) 
+                       {
+               print if /\b$pattern\b/i;
+               next LINE;
+                       }
+               }
 
-    # use spiffy new qr// operator, with /i flag even
-    use 5.005;
-    @popstates = qw(CO ON MI WI MN);
-    @poppats   = map { qr/\b$_\b/i } @popstates;
-    while (defined($line = <>)) {
-       for $patobj (@poppats) {
-           print $line if $line =~ /$patobj/;
-       }
-    }
+The qr// operator showed up in perl 5.005.  It compiles a
+regular expression, but doesn't apply it.  When you use the
+pre-compiled version of the regex, perl does less work. In
+this example, I inserted a map() to turn each pattern into
+its pre-compiled form.  The rest of the script is the same,
+but faster.
+
+    @patterns = map { qr/\b$_\b/i } qw( foo bar baz );
+
+    LINE: while( <> ) 
+       {
+               foreach $pattern ( @patterns ) 
+                       {
+               print if /\b$pattern\b/i;
+               next LINE;
+                       }
+               }
+
+In some cases, you may be able to make several patterns into
+a single regular expression.  Beware of situations that require
+backtracking though.
+
+       $regex = join '|', qw( foo bar baz );
+
+    LINE: while( <> ) 
+       {
+               print if /\b(?:$regex)\b/i;
+               }
+
+For more details on regular expression efficiency, see Mastering
+Regular Expressions by Jeffrey Freidl.  He explains how regular
+expressions engine work and why some patterns are surprisingly
+inefficient.  Once you understand how perl applies regular 
+expressions, you can tune them for individual situations.
 
 =head2 Why don't word-boundary searches with C<\b> work for me?

-- 
brian d foy, comdog@panix.com



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About