2011/4/1 demerphq <demerphq@gmail.com>: > On 31 March 2011 12:05, Alex Efros via RT <perlbug-followup@perl.org> wrote: >> Чтв. Мар. 31 01:16:36 2011, demerphq писал: >>> We never fixed the superlinear cache bug that was broken some time >>> around when the engine was derecursivized. >>> >>> Id bet a dollar this is related. >> >> BTW, your YAPH signature remind me about -Mre=debug, and I tried to >> compare these regexps this way. Results are interesting, but, sadly, I >> don't know perl internals good enough to understand them: >> >> $ perl -Mre=debug -e "/<X[^>]*>(.*?)(?=<a)/gis" >> and >> $ perl -Mre=debug -e "/<X[^>]*(?i-:>)(.*?)(?=<a)/gis" >> produce same output and work fast, but this one differs and slow: >> $ perl -Mre=debug -e "/<X[^>]*(?-i:>)(.*?)(?=<a)/gis" >> >> Last line in first two (fast) regex was: >> stclass EXACTF <<X> minlen 3 >> but in last (slow) regex it was: >> floating ">" at 2..2147483647 (checking floating) stclass EXACTF <<X> >> minlen 3 >> >> Right now we've huge amount of parsers working 24x7 with thousands of >> different regexps, and it's SLOW. Can anyone give some recommendations >> about how to work around this issue - like "add /i flag to every regex >> whenever possible", etc.? Because for now I'm even doesn't sure adding >> this redundant /i doesn't slowdown one regexp while speeding up another... > > You are encountering an optimisation that normally speeds things up by > allowing perl to fail fast. > > This is called the "longest constant string" optimization. During the > optimization phase we scan the pattern and calculate the position, and > value of the longest constant strings which are at a constant and > floating offset. > > So in your "slow" pattern that would be for constant offset '<X' and > for floating offset ' >'. > > For various reasons we prefer the floating string for this check. > > My guess is that the performance impact here is that your data is long > has a LOT of '>' in it, which would then result in perl scanning to > find the _last_ (iirc). Which is probably not the right thing. > > Now for various reasons this optimisation is disabled with a case > insensitive match. Which means that perl no longer sees the ">" and no > longer considers it a viable optimisation target, and therfore goes > with a different strategy which is to look for the "<X" and then match > from there. Your data probably makes this a much better strategy. > > Its hard to say what to do here. *Most* times this optimization is > highly useful. In your case clearly not. > > There certainly is not a generic solution here. For instance enabling > /i might make a few of your patterns faster, but likely will slow all > the rest down. Just in case someone else wants to add /i to make regex search faster: DO NOT add /i in the general case. In the general case and esp. for short scripts adding /i makes regex searches slower. It has to demand-load the whole unicore/To/Fold case tables (that's 1.6MB on 32bit), and searches for all unicode cases also. utf8_heavy.pl is demand-loaded, also Carp is loaded. Carp loads warnings which in turn loads all warnings::register_categories which is about 68kB on 32-bit. -- Reini Urban http://phpwiki.org/ http://murbreak.at/Thread Previous | Thread Next