develooper Front page | perl.perl5.porters | Postings from December 2004

Re: optimization idea

Thread Previous | Thread Next
From:
David Landgren
Date:
December 10, 2004 07:12
Subject:
Re: optimization idea
Message ID:
41B9BCD6.7040300@landgren.net
Steve Peters wrote:
> On Wednesday 27 October 2004 03:17 pm, David Nicol wrote:
> 
>>trie optimization for regexp groups means optimizing things like
>>
>>
>>       m/(flash|flat|flatulent|flabby)\b/
>>
>>into things like
>>
>>        m/fla(sh|bby|(t(|ulent)))\b/
>>
>>right?
>>
>>I wonder if it would be possible to do it with a regex preprocessor.
>>
> 
> 
> Dan Kogai released the very handy Regexp::Optimizer earlier this year to 
> perform this very task.  

I was going through the older messages (my apologies, I'm only an 
occasional p5p reader) and came across the above thread.

Apologies for tooting my own horn, but I have released Regexp::Assemble, 
which does much the same thing, but has slightly different behaviour, it 
merges common tails as well: fish, flash -> f(la|i)sh. The above-cited 
pattern becomes:

   fla(?:t(?:ulent)?|bby|sh)

Anyway, the point is that this result also neatly illustrates bug 
#32840. Suppose I want to know which original RE matched the target. My 
idea was to use (?:{...}) to leave a trail of breadcrumbs, and then use 
$^R to find out where I had been. It could then just read off $^R in the 
case of a match, index an appropriately-constructed list, and return the 
pattern. (You need to do this when metachars are involved, otherwise a 
hash lookup using the result would suffice).

my $r = qr/fla(?:t(?{1})(?:ulent(?{2}))?|bby(?{3})|sh(?{4}))/;
/$r/ and print "path=$^R\n" while( <> );

Running this gives:

% ./rmatch
flat
path=1
flop
flabby
path=3
flatulent
path=1

Bummer. flatulent returns 1 instead of 2. Furthermore, the following 
example:

my $m;
my $r = qr/fla(?:t(?{$m=1})(?:ulent(?{$m=2}))?|bby(?{$m=3})|sh(?{$m=4}))/;
/$r/ and print "path=$m\n" while( <> );

... does the right thing:

% ./rmatch2
flat
path=1
flabby
path=3
fish
flatulent
path=2

So if someone can fill me in with a couple of clues on how to start to 
understand regcomp.c/regexec.c I'd be most appreciative, because I'm 
finding it rather difficult to follow what's going on.

Thanks,
David


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About