develooper Front page | perl.perl5.porters | Postings from August 2010

RE: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
From:
Jan Dubois
Date:
August 16, 2010 12:17
Subject:
RE: Patch to make string-append on win32 100 times faster
Message ID:
014401cb3d77$b2d49570$187dc050$@activestate.com
On Mon, 16 Aug 2010, Reini Urban wrote:
> 2010/8/16 Jan Dubois <jand@activestate.com>:
> > Could you provide some evidence for this claim?  The only way a
> > "better malloc" can prevent this slowdown is by doing some kind
> > of overallocation itself.  Since the algorithm in this patch
> > is not necessarily cumulative with the overallocation by malloc()
> > it is very well possible that the change has no effect at all
> > on systems with a overallocating malloc().
> 
> This is just theory. I KNOW that plain gnumalloc and freebsd realloc
> do work fine.
> But now that we have the mess someone can test it. I don't have such systems
> so I cannot test it.

I'm getting a bit tired of you discarding Ben's actual test based on
some "knowledge" you imply to have that you can't even verify because
you don't have access to those systems.  This is a technical issue, not
a religious belief!

[...]

> > Because then it wouldn't be applied to other platforms, like FreeBSD.
>
> Uuh, nobody ever complained about freebsd realloc performance.
> It was always the fastest on the planet and still is.

I pointed out *twice* to you that Ben did actually measure it and came up
with pretty bad numbers for the freebsd reallocator.  You will need to come
up with some evidence why his benchmarks should be discarded.

To provide similar numbers for Linux and GNU malloc I've compiled bleadperl
just before and after the patch in question, both with usemymalloc=y
and usemymalloc=n.  The results are just for the "1E7 chars + 1E5 x 1E1 chars"
benchmark, as that is the slowest of the bunch.  I've run the benchmark
script 100 times for each Perl build and show the min/max runtimes to show
that there is quite a bit of noise:

Before, GNU malloc:  Min=38.6 Max=45.4 Avg=41.20
Before, Perl malloc: Min= 7.9 Max=14.2 Avg=11.14

After, GNU malloc:   Min= 9.7 Max=12.7 Avg=11.45
After, Perl malloc:  Min= 9.4 Max=13.2 Avg=11.16

It shows that GNU malloc on its own takes 4 times as long as GNU malloc with
the patch.  GNU malloc with this patch matches the time used by the Perl
malloc (usemymalloc=y), which doesn't seem to be affected by the patch.

> > Note thought that I added this remark for discussion about disabling it
> > under -Dusemyalloc:
> >
> > | b) Should the new over-allocation also be used under -Dusemyalloc.  It
> > |    will provide less benefit there, but also come at a lower cost because
> > |    the newlen can still fall inside the already allocated bucket size
> > |    anyways.  I don't have any strong opinion if it should be disabled
> > |    in this situation or not.
> >
> > But as I stated above, I don't think it will make much of a difference either
> > way for the -Dusemyalloc case, but would love to see some comprehensive
> > benchmarks.
> 
> At http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/b7c9133ff20009f2?pli=1
> were a lot of benchmarks and profiling data.
> He is testing string sizes of 1e5-1e7 byte not just pagesizes, piping
> a typical pdf to perl.

That is exactly the script I have been running in my tests above.
 
> Plain freebsd had for years the best realloc performance (first with
> phkmalloc, now
> they switched to jemalloc for better multi-core performance), which were always
> faster than simple gnumalloc (ptmalloc 2 or 3 based on Doug Lea's
> public domain malloc),
> but "gnumalloc" was fast enough. 12ms against 16sec.
> 
> Without this patch.

See above for a comparison between GNU malloc and Perl malloc.  Note that
the Google thread you reference *also* shows that Perl malloc performs
3 times better than GNU malloc at this particular benchmark.
 
> So I don't see any reason to "fix" gnumalloc or freebsd realloc
> default behaviour, unless
> someone reports problems there. Tuning realloc is black art ( I did it
> for Tie::CArray once )
> and don't want someone to touch this without any tests.

Nobody is tuning realloc(), this is all about improving the performance of
sv_grow().
 
> Only msvcrt is affected and so only platforms which use msvcrt realloc should
> be patched, esp. without any tests on the other platforms.

This is really getting old now...
 
> FYI: cygwin (newlib) uses normally freebsd derived libc
> implementations, but in this case
> cygwin uses not phkmalloc, just ptmalloc2.6.4 i.e. plain gnumalloc.
> - which has no public independent_comalloc() which is a shame btw.
> 
> -Dusemymalloc is overallocating a lot, but we know that. This is not a
> realloc problem per se, plain malloc does the same.

You seem to have Cygwin available to you.  Why don't you just test the
patch and report on any actual problems instead of spreading bad attitude?

Cheers,
-Jan


Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About