Front page | perl.perl5.porters |
Postings from March 2007
Re: The performance problem of 30678
Thread Previous
|
Thread Next
From:
Nicholas Clark
Date:
March 23, 2007 14:06
Subject:
Re: The performance problem of 30678
Message ID:
20070323210646.GZ5748@plum.flirble.org
On Fri, Mar 23, 2007 at 02:23:35PM +0100, demerphq wrote:
> The solution was to make a temporary copy of the regexp struct and a
> few of its fields and then use it each time. However this leads to a
> performance problem in code like
>
> my $qr=qr/(\d)\1/;
> /$qr/ and print for 1..100;
>
> Where we essentially make a copy, use it to match, throw it away a
> hundred times.
Performance is pretty grim on some platforms
x86 Linux is fairly sane, although valgrind's malloc reacts more badly to
all the free()ing implied by PERL_DESTRUCT_LEVEL=2 than the regular glibc
malloc does:
30677 run normally
22.10user 0.27system 0:22.90elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+26074minor)pagefaults 0swaps
30677 run with PERL_DESTRUCT_LEVEL=2
22.03user 0.25system 0:22.89elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+26076minor)pagefaults 0swaps
30677 run under valgrind
1188.06user 4.12system 20:13.82elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (15major+29142minor)pagefaults 0swaps
30677 under valgrind with PERL_DESTRUCT_LEVEL=2
1188.40user 3.89system 20:13.49elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (1major+29365minor)pagefaults 0swaps
30678
22.68user 0.26system 0:23.48elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+26079minor)pagefaults 0swaps
30678 under valgrind
1423.25user 4.97system 24:14.31elapsed 98%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+28904minor)pagefaults 0swaps
x86 FreeBSD hurts:
30677 run normally
real 0m14.330s
user 0m14.002s
sys 0m0.108s
30677 and with PERL_DESTRUCT_LEVEL=2
14.25 real 13.86 user 0.11 sys
30678 run normally
real 0m15.393s
user 0m14.854s
sys 0m0.094s
30678 and with PERL_DESTRUCT_LEVEL=2
560.45 real 520.39 user 0.09 sys
and Sparc Solaris turns to super-cooled treacle
30677 run normally
real 2m28.951s
user 2m28.359s
sys 0m0.540s
30677 and with PERL_DESTRUCT_LEVEL=2
real 2:29.1
user 2:28.5
sys 0.5
30678 run normally
real 2m38.266s
user 2m37.629s
sys 0m0.584s
30678 and with PERL_DESTRUCT_LEVEL=2
real 1:59:50.1
user 1:59:48.9
sys 0.8
[formatting differences are due to whether bash chose to use built in time,
or /usr/bin/time, depending on how it parsed and ran my command lines]
Yes, with PERL_DESTRUCT_LEVEL=2 Solaris now takes 2 hours to run t/op/pat.t
We seem to have hit pathological malloc behaviour. I'm not quite sure how,
or why, given that Linux copes.
But, digression, I do remember at my first job that they ran HP-UX. They'd
tried various architectures and HP-UX was best for the sort of code that
they ran. Then we had to get the code working on an NEC SX-4. [Grrr
"Super-UX". You keep using that word [super]. I do not think that it means
what you think that it means]
Anyway, it ran over 2**32 bytes of memory rather quickly, which upset parts
of their code. I tried it on the work Solaris box - it was also unhappy.
I tried it with Doug Lea's malloc - it was not unhappy.
So it's something to do with malloc. Turns out that the code was
realloc() to n, realloc() to n+1, etc, etc
It happened to be using the buffer for a grid of triangles, and I worked out
that it would asymptotically approach 2m items, for an infinite grid.
When I re-coded it to call malloc() once for that size, it was much happier.
So I think that the reason they used HP-UX at all was because HP's malloc()
had the best performance with poorly written code. I wonder how much hardware
business that pessimisation from HP won them.
So, anyway, time for someone to identify why we're hurting malloc. I won't
have time until 5.8.9 is out.
Nicholas Clark
Thread Previous
|
Thread Next