On 3 July 2011 09:32, <hv@crypt.org> wrote: > Karl Williamson <public@khwilliamson.com> wrote: > :I know Yves has said that he hates this aspect of this. It looks to me > :that it should be easy to remove this pass, and treat regex compilation > :similarly to string parsing: if you run out of room from your initial > :guess, you just realloc more space, trimming at the end. > : > :Is there something people are aware of that I'm not seeing here? > > As far as I remember, you don't know what size certain ops are going to > be (those that reference another location in the compiled regexp) until > you know whether the reference is close enough (?early enough) to use > a 1 (?2) byte offset (?absolute location) rather than a 2 (?4) byte one. > > If the size of one such reference changes, it pushes everything else > around, and I suspect you could very easily end up with quadratic or > worse behaviour trying to do the fixups. Yes this is my understanding as well. Similar story for the case where we upgrade the pattern to utf8. The size of all the strings might change, so we have to restart the calculation how long the compiled program will be. > Given that such large regexps are relatively rare [1], this may well not > be the ideal trade-off - if we know the pattern length, maybe we can decide > in advance the minimum length that can possibly require the larger > references, and just force all jumps large in that case (A); or attempt to > build with small jump references, and as soon as you find you can't, > throw that away and do it again with large ones (B). An alternative would be to construct the optree as freely allocated nodes in a first pass. Then the optimisation step would rewrite the optree as needed. Then we would transcribe the optree into the railway-normal-form structure that the engine uses (or not... ;-) When I was actively tinkering with the engine the whole "you must know the end before you get there" stuff caused a lot of trouble. You cant simply replace a set of opcodes with something else. You have to worry about whether it will fit, and things like that. I think a bunch of optimisations could be made faster/better/easier/possible if we were able to free the compile/optimise steps from the final representation. Yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next