Dave Mitchell <davem@iabyn.com> wrote: :I've recently been giving some thought to making parts of the regex engine :use the ability to treat 8 bytes of input string as a single UV to make :(for example) searching for a character class faster. [..lots of cool stuff..] :Any thoughts? I like the idea, I think there's every prospect it could speed things up. It also scares me, since it seems likely also to shovel a heap more complexity into the regexp engine, and likely to have bugs that are hard to find and hard to fix. I've been thinking for some time that I'd like to have a mechanism to gain finer control over regexp optimization, and if we were able to introduce such a mechanism and make the 8AAT approach subject to it I'd be a lot less nervous about the potential for instability. I'm imagining an interface via re.pm, probably modelled on warnings, that would allow you to enable or disable individual optimization strategies by name; I haven't particularly attempted to work out how to implement such a thing beyond assuming that copying the warnings.pm model should be relatively easy and appropriately efficient. Potential benefits could include: - giving developers an easy workaround if their particular regexp tickles a bug in some optimization; - giving a sane route to introduce aggressive optimizations that are known to be only sometimes faster, or only sometimes safe; - giving a direction and motivation towards untangling the existing code, and understanding what our existing optimizations are; - giving the opportunity to introduce new features currently impossible due to optimizations, such as pattern-matching against a stream (which mainly, I think, needs to skip all checks against string length). I should have time to look at the POC code in detail over the next few days, I'll be interested to see whether that make me more nervous or less. :) HugoThread Previous | Thread Next