On 30 December 2016 at 19:16, <hv@crypt.org> wrote: > Dave Mitchell <davem@iabyn.com> wrote: > :I've recently been giving some thought to making parts of the regex engine > :use the ability to treat 8 bytes of input string as a single UV to make > :(for example) searching for a character class faster. > [..lots of cool stuff..] > :Any thoughts? > > I like the idea, I think there's every prospect it could speed things up. > > It also scares me, since it seems likely also to shovel a heap more > complexity into the regexp engine, and likely to have bugs that are > hard to find and hard to fix. > > I've been thinking for some time that I'd like to have a mechanism to > gain finer control over regexp optimization, and if we were able to > introduce such a mechanism and make the 8AAT approach subject to it > I'd be a lot less nervous about the potential for instability. We need to produce a proper optree to do the kind of optimisations we need. The whole way we compile makes life extremely difficult. I have an experimental optimising regex engine implemented in perl which is able to avoid many degenerate patterns, but it requires a proper optree to be constructed. An example is that the following pattern: /a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?/ should be optimised into /a{0,16}/ and that optimisation should be done at compile time. Currently we do part of this optimisation but at run time. Eliminating degenerate patterns IMO has a potentially huge impact. > I'm imagining an interface via re.pm, probably modelled on warnings, > that would allow you to enable or disable individual optimization > strategies by name; I haven't particularly attempted to work out how > to implement such a thing beyond assuming that copying the warnings.pm > model should be relatively easy and appropriately efficient. > > Potential benefits could include: > - giving developers an easy workaround if their particular regexp tickles > a bug in some optimization; > - giving a sane route to introduce aggressive optimizations that are > known to be only sometimes faster, or only sometimes safe; > - giving a direction and motivation towards untangling the existing > code, and understanding what our existing optimizations are; > - giving the opportunity to introduce new features currently impossible > due to optimizations, such as pattern-matching against a stream (which > mainly, I think, needs to skip all checks against string length). Yes streaming is interesting. I dont think it is possible with our regex engine. Although it would be possible to use a pattern of a bounded length. I hacked proper detection of infinite patterns into the regex engine some time back to be able to use regexes for $/, but never got around to finishing it. There is some kind of overlap between streaming regexes and using regexps for $/, and there I am quite sure you *must* have length data, and a bounded pattern size, or it is not doable. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next