Front page | perl.perl6.internals |
Postings from May 2002
Regex timing and runtime options
From:
Mark Kvale
Date:
May 14, 2002 22:07
Subject:
Regex timing and runtime options
Message ID:
Pine.OSF.4.21.0205142206390.595182-100000@lehrer.ucsf.edu
Steve Fink recently made it easy to test parrot under various runtime
options:
-g - suppress use of computed goto
-P - use prederef
-j - use JIT compiler
I was curious to see what effects these would have on regex timings
for the various schemes I cooked up. As before, I am using the timing
model
t[i] = co + n * (lo + m*ro[i])
with t[i] the time taken for regex i
co the regex independent constant overhead
n the number of matches attempted
lo the regex independent per match overhead
m the length of regex and string
ro[i] the per character matching time
I am running these tests on a lintel notebook with gcc 2.95, so have
the computed goto enabled by default. In what follows, 'reg'
represents the regex compiled into core ops only, 'rx' represents the
regex compiled into core ops and rx ops, 'spen' is the regex executed
on the Spencer regex engine, and 'perl' is the match in perl5.6.0. I
used the parrot_2002-05-15_010000.tar.gz snapshot.
For the simplest task of matching 'a' against 'a', the results are
"a"x$m =~ /"a"x$m/:
reg: lo = 5.588e-06 ro = 2.255e-07
reg -g: lo = 6.019e-06 ro = 3.44e-07
reg -P: lo = 5.967e-06 ro = 3.141e-07
reg -j: lo = 5.81e-06 ro = 2.429e-07
rx: lo = 2.187e-06 ro = 5.644e-07
rx -g: lo = 2.877e-06 ro = 5.657e-07
rx -P: lo = 2.705e-06 ro = 5.446e-07
rx -j: lo = 2.723e-06 ro = 5.641e-07
perl: lo = 7.915e-07 ro = 1.273e-08
spen: lo = 5.431e-07 ro = 1.51e-08
For reg, turning on the switches imposes a 10% penalty in per match
overhead; for rx that is a 20-30% penalty. For reg, the default
computed goto is fastest, with JIT 7% slower and -g and -P 40-50%
slower in match time per character. For rx there is little variation,
with the prederef option coming in 4% faster than default.
For a match loop we have
"a"x1000 =~ /a+/
lo + r
reg: 0.000752
reg -g: 0.001084
reg -P: 0.001021
reg -j: 0.000757
rx: 0.001035
rx -g: 0.001165
rx -P: 0.001095
rx -j: 0.001033
spen: 3.92e-06
perl: 7.56e-06
Again for reg, computed goto and JIT are evenly matched, while
prederef and cgoto suppression impose 30% penalties. For rx, JIT and
cgoto are fastest, with prederef and suppression of cgoto lagging by
7%.
For a moderate amount of backtracking, we get
("ab"x10000)."c" =~ /abc/
lo + r
reg: 0.0193
reg -g: 0.0285
reg -P: 0.0261
reg -j: 0.0181
rx: 0.0287
rx -g: 0.0299
rx -P: 0.0284
rx -j: 0.0284
spen: 0.00735
perl: 0.000269
For reg, the JIT compiler is faster the cgoto by 6% and prederef and
cgoto suppression are a good deal slower. For rx, cgoto, JIT and
prederef are all similar, with cgoto suppression only 5% slower.
Conclusions:
1) None of the runtime options make a huge difference in regex times,
but reg seems more susceptible to optimization than rx.
2) The two consistently fastest schemes are computed goto and the JIT
compiler. Suppressing computed goto or using prederef slows down
reg by 20-50%. rx is less affected, with a -4-10% slowdown for -g or
-P.
3) In retrospect, the timing results quoted in my 'bakeoff' were near
optimal for the reg and rx schemes, as they all used the default
computed goto option.
-Mark