develooper Front page | perl.perl5.porters | Postings from March 2007

The performance problem of 30678

Thread Next
From:
demerphq
Date:
March 23, 2007 06:23
Subject:
The performance problem of 30678
Message ID:
9b18b3110703230623j192945ebnfc42d58e778454d2@mail.gmail.com
Ok, heres the issue that 30678 solves in a nutshell in code:

#!perl -l
my $qr=qr/fo+oba+r/;
'foobar'=~/$qr/;
print "A:",$1,$2;
{
  'fooooobaaaaar'=~/$qr/;
  print "B:",$1,$2;
}
print "C:",$1,$2;

We should get
A:foobar
B:fooooobaaaaar
C:foobar

But before 30678 we get:
A:foobar
B:fooooobaaaaar
C:fooooobaaaaar

The solution was to make a temporary copy of the regexp struct and a
few of its fields and then use it each time. However this leads to a
performance problem in code like

my $qr=qr/(\d)\1/;
/$qr/ and print for 1..100;

Where we essentially make a copy, use it to match, throw it away a
hundred times.

The solution to this I guess is to somehow make the original regexp
store a list of its copies, and when a copy is freed only /actually/
free it when there is another copy in the list that is "available" to
be filled in. There is a bit of trickery required to fill it in, but
beyond that we could use any number of strategies to store the copies.
Note that the internal structure (which contains the compiled program)
is not copied. Its only the rexec structure itself and the fields
which may be modified during match which are copied.

The other solution is to split the match results out and use similar
logic on the result struct (which would be smaller than the current
regexp struct by more than half).

Either way this problem has a lot of similarity to the work that was
done with arenas and sv types and stuff, so I expect that the
expertise gained there could be profitable put to use with regexp
structures or match structures.

Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About