develooper Front page | perl.perl5.porters | Postings from July 2011

Re: Why is there a sizing pass in regex compilation

Thread Previous | Thread Next
From:
hv
Date:
July 3, 2011 01:29
Subject:
Re: Why is there a sizing pass in regex compilation
Message ID:
201107030732.p637WTf31630@crypt.org
Karl Williamson <public@khwilliamson.com> wrote:
:I know Yves has said that he hates this aspect of this.  It looks to me 
:that it should be easy to remove this pass, and treat regex compilation 
:similarly to string parsing: if you run out of room from your initial 
:guess, you just realloc more space, trimming at the end.
:
:Is there something people are aware of that I'm not seeing here?

As far as I remember, you don't know what size certain ops are going to
be (those that reference another location in the compiled regexp) until
you know whether the reference is close enough (?early enough) to use
a 1 (?2) byte offset (?absolute location) rather than a 2 (?4) byte one.

If the size of one such reference changes, it pushes everything else
around, and I suspect you could very easily end up with quadratic or
worse behaviour trying to do the fixups.

Given that such large regexps are relatively rare [1], this may well not
be the ideal trade-off - if we know the pattern length, maybe we can decide
in advance the minimum length that can possibly require the larger
references, and just force all jumps large in that case (A); or attempt to
build with small jump references, and as soon as you find you can't,
throw that away and do it again with large ones (B).

(I haven't looked at this code in a while, apologies for vagueness.)

Obviously realloc costs too, but I imagine the cost of having to do
the sizing pass every time (instead of never (A), or rarely (B)) would
be at worst comparable.

Hugo

[1] citation needed

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About