develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
David Golden
August 16, 2010 06:37
Re: Patch to make string-append on win32 100 times faster
Message ID:
I think I'm with Reini on this one. I'd rather keep it Windows-only until a
performance problem is documented elsewhere.

It's easy to enable for other platforms when needed.

I'm open to people doing the research before 5.14, of course. :-)


On Aug 16, 2010 7:19 AM, "Reini Urban" <> wrote:
> 2010/8/16 Jan Dubois <>:
>> On Sun, 15 Aug 2010, Reini Urban wrote:
>>> Jan Dubois schrieb:
>>> > On Fri, 30 Jul 2010, Wolfram Humann wrote:
>>> > The discussion of this change seemed to have stalled, but I see
>>> > +1 votes from David Golden and Marvin Humphrey, with additional
>>> > information from Ben Morrow that the patch also helps on FreeBSD
>>> > (when building without -Dusemymalloc), with nobody voicing
>>> > any disagreement about applying the patch.
>>> This particular slowdown was only recognized for WIN32 native malloc,
>>> but not for other platforms with better malloc libs.
>> Did you read the paragraph you quoted above?  It explicitly claims that
>> the slowdown happens on other platforms when using the platform native
>> malloc.
>>> Those platforms are now hurt by using overallocation,
>>> i.e. need more memory, e.g. with piping.
>> Could you provide some evidence for this claim?  The only way a
>> "better malloc" can prevent this slowdown is by doing some kind
>> of overallocation itself.  Since the algorithm in this patch
>> is not necessarily cumulative with the overallocation by malloc()
>> it is very well possible that the change has no effect at all
>> on systems with a overallocating malloc().
> This is just theory. I KNOW that plain gnumalloc and freebsd realloc
> do work fine.
> But now that we have the mess someone can test it. I don't have such
> so I cannot test it.
>> For example, assume Perl is appending 100 bytes to a 1000 bytes string.
>> The patch under discussion will make sure that sv_grow() will request
>> 1260 bytes (1000 + 1000>>2 + 10) instead of just 1100 bytes from
>> Assume the original 1000 bytes were allocated in a 1024 bytes slab in
>> the allocator, so the 1100 bytes wouldn't fit in anymore, and realloc()
>> will now move this to e.g. a 1536 byte slab.  In that case it doesn't
>> make any difference that we now asked for 1260 bytes instead of 1100.
>> Also note that the minimum growth is based on the old buffer size,
>> so appending 300 bytes to the 1000 byte string will only request
>> 1300 bytes, because 1300 is already larger than the 1260 minimum
>> realloc growth.
>> So until proven otherwise I doubt that this patch has any noticeable
>> effect on a "good malloc()".  On a "medium malloc()" I would expect
>> it to improve performance somewhat, at a moderate additional memory
>> requirement.
>>> Why was this patch not applied with the appropriate
>>> #if defined(_WIN32) or what is used for MSVC and mingw?
>> Because then it wouldn't be applied to other platforms, like FreeBSD.
> Uuh, nobody ever complained about freebsd realloc performance.
> It was always the fastest on the planet and still is.
>> Note thought that I added this remark for discussion about disabling it
>> under -Dusemyalloc:
>> | b) Should the new over-allocation also be used under -Dusemyalloc.  It
>> |    will provide less benefit there, but also come at a lower cost
>> |    the newlen can still fall inside the already allocated bucket size
>> |    anyways.  I don't have any strong opinion if it should be disabled
>> |    in this situation or not.
>> But as I stated above, I don't think it will make much of a difference
>> way for the -Dusemyalloc case, but would love to see some comprehensive
>> benchmarks.
> At
> were a lot of benchmarks and profiling data.
> He is testing string sizes of 1e5-1e7 byte not just pagesizes, piping
> a typical pdf to perl.
> Plain freebsd had for years the best realloc performance (first with
> phkmalloc, now
> they switched to jemalloc for better multi-core performance), which were
> faster than simple gnumalloc (ptmalloc 2 or 3 based on Doug Lea's
> public domain malloc),
> but "gnumalloc" was fast enough. 12ms against 16sec.
> Without this patch.
> So I don't see any reason to "fix" gnumalloc or freebsd realloc
> default behaviour, unless
> someone reports problems there. Tuning realloc is black art ( I did it
> for Tie::CArray once )
> and don't want someone to touch this without any tests.
> Only msvcrt is affected and so only platforms which use msvcrt realloc
> be patched, esp. without any tests on the other platforms.
> FYI: cygwin (newlib) uses normally freebsd derived libc
> implementations, but in this case
> cygwin uses not phkmalloc, just ptmalloc2.6.4 i.e. plain gnumalloc.
> - which has no public independent_comalloc() which is a shame btw.
> -Dusemymalloc is overallocating a lot, but we know that. This is not a
> realloc problem per se,
> plain malloc does the same.
> --
> Reini Urban

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About