develooper Front page | perl.perl6.internals | Postings from January 2002

Re: The dreaded regex patch

Thread Previous | Thread Next
From:
Steve Fink
Date:
January 9, 2002 12:12
Subject:
Re: The dreaded regex patch
Message ID:
20020109201159.GD26868@foxglove.digital-integrity.com
On Wed, Jan 09, 2002 at 03:16:40AM -0800, Brent Dax wrote:
> Okay, here it is.  Attached is the regular expression patch.  It

Is rx_advance necessary? What's the difference between

  /R/

and

  /^(.*?R)/

if you count the parens $& $1 $2 ... instead of $1 $2 $3 ...?

Speed, I guess?

(Okay, I really meant /\A([\w\W]*?R)/ )

Also, why specific opcodes for \w etc.? I would think you'd do those
as constants. (Constant bitmaps for 8-bit, or constant
MultiLevelFunkyThings for Unicode.)

I have a barely begun parrot RE implementation of my own, and a much
further along RE opcode generator & optimizer, but no time to work on
either. Maybe I should dust off the optimizer.

The main thrust of the optimizer is to try to find portions of the RE
to collapse into a DFA (or DFA-like thing). For example,
/(?:foo|bar)d/ is matched identically with a DFA or NFA, while
/(?:need|needle)le(ss)?/ is not. But this quickly makes me wonder: are
optimizations like finding a fixed "abc" substring within the input
really any faster than walking through a lookup table-based state
machine? I don't buy the Boyer-Moore argument that it's faster because
you can advance by more than one character at a time -- you're
stuffing the input into the cache either way, you may as well look at
all of it. I suppose if you're doing funky backwards matching for
a+bc+ it cuts out some backtracking, but that seems tricky to ensure
you're matching the right stuff. /a+bc+/ works well, but
/grand(ma|mother)a+bc+/ had better be darn sure it doesn't bite off
grandma's toe.

Not sure why I'm writing up these musings, since they don't apply to
the opcode engine.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About