develooper Front page | perl.perl5.porters | Postings from August 2013

[perl #107816] Performance regression since 0abd0d78

Thread Next
From:
Karl Williamson via RT
Date:
August 15, 2013 04:39
Subject:
[perl #107816] Performance regression since 0abd0d78
Message ID:
rt-3.6.HEAD-2552-1376541576-264.107816-15-0@perl.org
On Thu Aug 08 20:52:13 2013, public@khwilliamson.com wrote:
> On 08/08/2013 12:32 PM, demerphq wrote:
> > On 8 August 2013 10:53, Nicholas Clark <nick@ccl4.org> wrote:
> >> On Sun, Jan 27, 2013 at 05:22:14PM -0800, James E Keenan via RT
> wrote:
> >>> On Tue Jan 17 09:08:29 2012, demerphq wrote:
> >>
> >>>> More possibly later.
> >>>> Yves
> >>>>
> >>>
> >>> Yves, list:  Are there still issues we need to discuss in this
> ticket?
> >>> If not, can we close it?
> >>
> >> If I read it correctly it's summed up as "the TRIE optimisation is
> not used
> >> on non-unicode patterns under /i" and this is considered to be a
> bug.
> >> And it remains unfixed. (And it's not been decided that it is too
> hard to
> >> fix)
> >
> > I agree. Its a matter of tuits. As Karl said the real issue is one
> or
> > two awkward edge cases.
> >
> > Yves
> >
> >
> 
> I am of the opinion that we should not expend many tuits on
> performance
> for /d matching, which we should hope gets less and less use.
> 
> That said, I estimate that it is just a couple hours of work for me to
> get a 76% solution to this.  That is because the folds of all but 61
> of
> 256 Latin1 characters are the same if the target string is UTF-8 or
> not.
>   If a pattern contains only those 195 characters, then it can be
> treated as Unicode, and the trie optimization used.  I'm very familiar
> with the affected areas of the code. The trie code, which I'm not very
> familiar with, is not affected.  Doing this has the added bonus of
> removing a run-time test (that test being "Am I supposed to use
> Unicode,
> or not here?") at each such node entry in the pattern matching.
> 
> I do believe that /a should have TRIES generated; they currently
> don't;
> a fact I had forgotten.  It should be easy for someone familiar with
> that code to do this enhancement.  It is just Unicode, but folds that
> otherwise would cross the 127/128 boundary are disallowed.  Should I
> write a ticket for this?
> 

Commit 88b3a463c4e11c60eea2075693434b32f43e57fe now implements what I
discussed above,  Thus if a node doesn't have any of those 61
characters, nor the sequence 'ss' (upper or lowercase), it will be
automatically trieable.  I think that except for the 'ss' problem, it
would be fairly easy, but kludgy, to fix this for the other characters.
 But I don't believe it is worth it.  My opinion is that we should close
this ticket for the remainder as a won't-fix.  Every ASCII-range
character now works.

I started playing with what it would take to get tries for /iaa
matching, and it is quite easy to get it working for all but 2 edge
cases.  I haven't looked further for how hard solutions to these are.

In looking to re-enable the tests that 0abd0d78 commented-out, I noticed
a difference in the generated tries, and apparently for the worse.  The
test was using 
/(\.COM|\.EXE|\.BAT|\.CMD|\.VBS|\.VBE|\.JS|\.JSE|\.WSF|\.WSH|\.pyo|\.pyc|\.pyw|\.py)$/i

The generated code used to notice that the dot is common to all the
branches, and factor it out, making the trie on just the remainder.  Now
the dot is in each individual branch.  Is this to be expected?

I also noticed that often, bracketed classes don't generate tries, but
the explicit alternation does.  Thus, 
  qr/A|B|C/
generates a trie, but
 qr/[ABC]/
doesn't.  Is this also expected?


-- 
Karl Williamson

---
via perlbug:  queue: perl5 status: open
https://rt.perl.org:443/rt3/Ticket/Display.html?id=107816

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About