develooper Front page | perl.perl5.porters | Postings from December 2016

Re: some thoughts on 8-bytes-at-a-time in the regex engine

Thread Previous | Thread Next
From:
demerphq
Date:
December 31, 2016 15:30
Subject:
Re: some thoughts on 8-bytes-at-a-time in the regex engine
Message ID:
CANgJU+VVmXvcXszK7RVmt0c2jZrzLwu86Sqy-xh0EAXAn0cJ1Q@mail.gmail.com
On 30 December 2016 at 19:16,  <hv@crypt.org> wrote:
> Dave Mitchell <davem@iabyn.com> wrote:
> :I've recently been giving some thought to making parts of the regex engine
> :use the ability to treat 8 bytes of input string as a single UV to make
> :(for example) searching for a character class faster.
> [..lots of cool stuff..]
> :Any thoughts?
>
> I like the idea, I think there's every prospect it could speed things up.
>
> It also scares me, since it seems likely also to shovel a heap more
> complexity into the regexp engine, and likely to have bugs that are
> hard to find and hard to fix.
>
> I've been thinking for some time that I'd like to have a mechanism to
> gain finer control over regexp optimization, and if we were able to
> introduce such a mechanism and make the 8AAT approach subject to it
> I'd be a lot less nervous about the potential for instability.

We need to produce a proper optree to do the kind of optimisations we need.

The whole way we compile makes life extremely difficult.

I have an experimental optimising regex engine implemented in perl
which is able to avoid many degenerate patterns, but it requires a
proper optree to be constructed.

An example is that the following pattern:

/a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?/

should be optimised into

/a{0,16}/

and that optimisation should be done at compile time. Currently we do
part of this optimisation but at run time.

Eliminating degenerate patterns IMO has a potentially huge impact.

> I'm imagining an interface via re.pm, probably modelled on warnings,
> that would allow you to enable or disable individual optimization
> strategies by name; I haven't particularly attempted to work out how
> to implement such a thing beyond assuming that copying the warnings.pm
> model should be relatively easy and appropriately efficient.
>
> Potential benefits could include:
> - giving developers an easy workaround if their particular regexp tickles
> a bug in some optimization;
> - giving a sane route to introduce aggressive optimizations that are
> known to be only sometimes faster, or only sometimes safe;
> - giving a direction and motivation towards untangling the existing
> code, and understanding what our existing optimizations are;
> - giving the opportunity to introduce new features currently impossible
> due to optimizations, such as pattern-matching against a stream (which
> mainly, I think, needs to skip all checks against string length).

Yes streaming is interesting. I dont think it is possible with our
regex engine. Although it would be possible to use a pattern of a
bounded length. I hacked proper detection of infinite patterns into
the regex engine some time  back to be able to use regexes for $/, but
never got around to finishing it. There is some kind of overlap
between streaming regexes and using regexps for $/, and there I am
quite sure you *must* have length data, and a bounded pattern size, or
it is not doable.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About