develooper Front page | perl.perl5.porters | Postings from December 2016

Re: some thoughts on 8-bytes-at-a-time in the regex engine

Thread Previous | Thread Next
From:
hv
Date:
December 30, 2016 18:19
Subject:
Re: some thoughts on 8-bytes-at-a-time in the regex engine
Message ID:
201612301816.uBUIG6P09418@crypt.org
Dave Mitchell <davem@iabyn.com> wrote:
:I've recently been giving some thought to making parts of the regex engine
:use the ability to treat 8 bytes of input string as a single UV to make
:(for example) searching for a character class faster.
[..lots of cool stuff..]
:Any thoughts?

I like the idea, I think there's every prospect it could speed things up.

It also scares me, since it seems likely also to shovel a heap more
complexity into the regexp engine, and likely to have bugs that are
hard to find and hard to fix.

I've been thinking for some time that I'd like to have a mechanism to
gain finer control over regexp optimization, and if we were able to
introduce such a mechanism and make the 8AAT approach subject to it
I'd be a lot less nervous about the potential for instability.

I'm imagining an interface via re.pm, probably modelled on warnings,
that would allow you to enable or disable individual optimization
strategies by name; I haven't particularly attempted to work out how
to implement such a thing beyond assuming that copying the warnings.pm
model should be relatively easy and appropriately efficient.

Potential benefits could include:
- giving developers an easy workaround if their particular regexp tickles
a bug in some optimization;
- giving a sane route to introduce aggressive optimizations that are
known to be only sometimes faster, or only sometimes safe;
- giving a direction and motivation towards untangling the existing
code, and understanding what our existing optimizations are;
- giving the opportunity to introduce new features currently impossible
due to optimizations, such as pattern-matching against a stream (which
mainly, I think, needs to skip all checks against string length).

I should have time to look at the POC code in detail over the next few
days, I'll be interested to see whether that make me more nervous or
less. :)

Hugo

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About