develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
August 20, 2010 07:53
Re: Patch to make string-append on win32 100 times faster
Message ID:
On 16 August 2010 08:19, Jan Dubois <> wrote:
> On Sun, 15 Aug 2010, Reini Urban wrote:
>> Jan Dubois schrieb:
>> > On Fri, 30 Jul 2010, Wolfram Humann wrote:
>> > The discussion of this change seemed to have stalled, but I see
>> > +1 votes from David Golden and Marvin Humphrey, with additional
>> > information from Ben Morrow that the patch also helps on FreeBSD
>> > (when building without -Dusemymalloc), with nobody voicing
>> > any disagreement about applying the patch.
>> This particular slowdown was only recognized for WIN32 native malloc,
>> but not for other platforms with better malloc libs.
> Did you read the paragraph you quoted above?  It explicitly claims that
> the slowdown happens on other platforms when using the platform native
> malloc.
>> Those platforms are now hurt by using overallocation,
>> i.e. need more memory, e.g. with piping.
> Could you provide some evidence for this claim?  The only way a
> "better malloc" can prevent this slowdown is by doing some kind
> of overallocation itself.

This is not correct. Mallocs/reallocs that can merge blocks do not
have the performance penalty that this algorithm seeks to work around.
The problem here is that the Win32 realloc always copies, and thus
extending a block a character at a time becomes exponential. With a
realloc that merges blocks and only copies where there is insufficient
contiguous blocks does not have this problem.

> Since the algorithm in this patch
> is not necessarily cumulative with the overallocation by malloc()
> it is very well possible that the change has no effect at all
> on systems with a overallocating malloc().

> For example, assume Perl is appending 100 bytes to a 1000 bytes string.
> The patch under discussion will make sure that sv_grow() will request
> 1260 bytes (1000 + 1000>>2 + 10) instead of just 1100 bytes from realloc().
> Assume the original 1000 bytes were allocated in a 1024 bytes slab in
> the allocator, so the 1100 bytes wouldn't fit in anymore, and realloc()
> will now move this to e.g. a 1536 byte slab.  In that case it doesn't
> make any difference that we now asked for 1260 bytes instead of 1100.
> Also note that the minimum growth is based on the old buffer size,
> so appending 300 bytes to the 1000 byte string will only request
> 1300 bytes, because 1300 is already larger than the 1260 minimum
> realloc growth.
> So until proven otherwise I doubt that this patch has any noticeable
> effect on a "good malloc()".  On a "medium malloc()" I would expect
> it to improve performance somewhat, at a moderate additional memory
> requirement.
>> Why was this patch not applied with the appropriate
>> #if defined(_WIN32) or what is used for MSVC and mingw?
> Because then it wouldn't be applied to other platforms, like FreeBSD.
> Note thought that I added this remark for discussion about disabling it
> under -Dusemyalloc:
> | b) Should the new over-allocation also be used under -Dusemyalloc.  It
> |    will provide less benefit there, but also come at a lower cost because
> |    the newlen can still fall inside the already allocated bucket size
> |    anyways.  I don't have any strong opinion if it should be disabled
> |    in this situation or not.
> But as I stated above, I don't think it will make much of a difference either
> way for the -Dusemyalloc case, but would love to see some comprehensive
> benchmarks.

Nicholas had some impressive benchmarks from this change for AIX or
HPUX (i forget), whose realloc has the same problems as does Win32's.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About