develooper Front page | perl.perl5.porters | Postings from May 2012

[perl #76546] regex engine slowdown bug

From:
Father Chrysostomos via RT
Date:
May 21, 2012 14:41
Subject:
[perl #76546] regex engine slowdown bug
Message ID:
rt-3.6.HEAD-4610-1337636487-966.76546-15-0@perl.org
On Sun Apr 22 05:26:30 2012, demerphq wrote:
> On 22 April 2012 14:15, demerphq <demerphq@gmail.com> wrote:
> > On 22 April 2012 13:24, demerphq <demerphq@gmail.com> wrote:
> >> On 22 April 2012 13:03, demerphq <demerphq@gmail.com> wrote:
> >>> On 22 April 2012 12:37, demerphq <demerphq@gmail.com> wrote:
> >>>> [4] regcomp.c
> >>>> /* A temporary algorithm prefers floated substr to fixed one to
> dig
> >>>> more info. */
> >>>> � � � �if (longest_fixed_length > longest_float_length) {
> >>>> � � � � � �r->check_end_shift = r->anchored_end_shift;
> >>>> � � � � � �r->check_substr = r->anchored_substr;
> >>>> � � � � � �r->check_utf8 = r->anchored_utf8;
> >>>> � � � � � �r->check_offset_min = r->check_offset_max = r-
> >anchored_offset;
> >>>> � � � � � �if (r->extflags & RXf_ANCH_SINGLE)
> >>>> � � � � � � � �r->extflags |= RXf_NOSCAN;
> >>>> � � � �}
> >>>> � � � �else {
> >>>> � � � � � �r->check_end_shift = r->float_end_shift;
> >>>> � � � � � �r->check_substr = r->float_substr;
> >>>> � � � � � �r->check_utf8 = r->float_utf8;
> >>>> � � � � � �r->check_offset_min = r->float_min_offset;
> >>>> � � � � � �r->check_offset_max = r->float_max_offset;
> >>>> � � � �}
> >>>
> >>> I should read more carefully. The code and comment don't match.
> And
> >>> the reason we dont use "<X" as the start of the pattern is because
> it
> >>> is not a required substring as it is case insensitive, it is
> actually
> >>> a stclass optimization which is separate from the required
> substring
> >>> optimization. Sorry for the confusion.
> >>
> >> The output looks like this:
> >>
> >> Found floating substr ">" at offset 34...
> >> start_shift: 21 check_at: 34 s: 0 endpos: 35
> >> This position contradicts STCLASS...
> >> Looking for floating substr starting at offset 35...
> >> Found floating substr ">" at offset 52...
> >> start_shift: 21 check_at: 52 s: 0 endpos: 53
> >> This position contradicts STCLASS...
> >> Looking for floating substr starting at offset 53...
> >> Found floating substr ">" at offset 70...
> >> start_shift: 21 check_at: 70 s: 0 endpos: 71
> >> This position contradicts STCLASS...
> >> Looking for floating substr starting at offset 71...
> >> Found floating substr ">" at offset 88...
> >> start_shift: 21 check_at: 88 s: 0 endpos: 89
> >> This position contradicts STCLASS...
> >> Looking for floating substr starting at offset 89...
> >> Did not find floating substr ">"...
> >>
> >> The quadratic nature comes from the STCLASS optimisation running
> from
> >> 0 each time (s:0). It doesnt remember what it already searched.
> >
> > This patch fixes it. It is a little naive and I cant decide if it is
> > safe to apply at all.
> >
> > The old code was slow because when it had a floating required string
> > and a stclass item it would basically fall into a loop (via goto)
> > doing
> >
> > while not found:
> > �search for floating string (incrementally)
> > �search for stclass via find_byclass() from the start of the string
> > to the position of the floating string
> >
> > So the search for the stclass would repeatedly check the same
> ground,
> > producing the quadratic behavior. My patch changes this so its:
> >
> > while not found:
> > �search for floating string (incrementally)
> > �search for stclass via find_byclass() incrementally up to the
> floating string
> >
> > however the reason I am suspicious is that I wonder if there is a
> case
> > where the floating string matches the STCLASS as well, such that
> > starting the stclass search at the point we found the floating
> string
> > is unsafe. Which means that really find_byclass should manage this
> > pointer as it is really the only one that knows where it safe to
> > restart the string at.
> 
> Actually, find_byclass() failing on s..e means you cant match anywhere
> in s..e so restarting at e seems fine.
> 
> Unless someone can see a counter argument i will apply this sometime.

Like now? :-)


-- 

Father Chrysostomos


---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=76546



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About