Front page | perl.perl5.porters |
Postings from August 2010
Re: Patch to make string-append on win32 100 times faster
Thread Previous
|
Thread Next
From:
Reini Urban
Date:
August 16, 2010 04:19
Subject:
Re: Patch to make string-append on win32 100 times faster
Message ID:
AANLkTimy0NojwtH1xYY3OHWgx-qZ9JLnFftNC8XWtmGM@mail.gmail.com
2010/8/16 Jan Dubois <jand@activestate.com>:
> On Sun, 15 Aug 2010, Reini Urban wrote:
>> Jan Dubois schrieb:
>> > On Fri, 30 Jul 2010, Wolfram Humann wrote:
>> > The discussion of this change seemed to have stalled, but I see
>> > +1 votes from David Golden and Marvin Humphrey, with additional
>> > information from Ben Morrow that the patch also helps on FreeBSD
>> > (when building without -Dusemymalloc), with nobody voicing
>> > any disagreement about applying the patch.
>>
>> This particular slowdown was only recognized for WIN32 native malloc,
>> but not for other platforms with better malloc libs.
>
> Did you read the paragraph you quoted above? It explicitly claims that
> the slowdown happens on other platforms when using the platform native
> malloc.
>
>> Those platforms are now hurt by using overallocation,
>> i.e. need more memory, e.g. with piping.
>
> Could you provide some evidence for this claim? The only way a
> "better malloc" can prevent this slowdown is by doing some kind
> of overallocation itself. Since the algorithm in this patch
> is not necessarily cumulative with the overallocation by malloc()
> it is very well possible that the change has no effect at all
> on systems with a overallocating malloc().
This is just theory. I KNOW that plain gnumalloc and freebsd realloc
do work fine.
But now that we have the mess someone can test it. I don't have such systems
so I cannot test it.
> For example, assume Perl is appending 100 bytes to a 1000 bytes string.
> The patch under discussion will make sure that sv_grow() will request
> 1260 bytes (1000 + 1000>>2 + 10) instead of just 1100 bytes from realloc().
>
> Assume the original 1000 bytes were allocated in a 1024 bytes slab in
> the allocator, so the 1100 bytes wouldn't fit in anymore, and realloc()
> will now move this to e.g. a 1536 byte slab. In that case it doesn't
> make any difference that we now asked for 1260 bytes instead of 1100.
>
> Also note that the minimum growth is based on the old buffer size,
> so appending 300 bytes to the 1000 byte string will only request
> 1300 bytes, because 1300 is already larger than the 1260 minimum
> realloc growth.
>
> So until proven otherwise I doubt that this patch has any noticeable
> effect on a "good malloc()". On a "medium malloc()" I would expect
> it to improve performance somewhat, at a moderate additional memory
> requirement.
>
>> Why was this patch not applied with the appropriate
>> #if defined(_WIN32) or what is used for MSVC and mingw?
>
> Because then it wouldn't be applied to other platforms, like FreeBSD.
Uuh, nobody ever complained about freebsd realloc performance.
It was always the fastest on the planet and still is.
> Note thought that I added this remark for discussion about disabling it
> under -Dusemyalloc:
>
> | b) Should the new over-allocation also be used under -Dusemyalloc. It
> | will provide less benefit there, but also come at a lower cost because
> | the newlen can still fall inside the already allocated bucket size
> | anyways. I don't have any strong opinion if it should be disabled
> | in this situation or not.
>
> But as I stated above, I don't think it will make much of a difference either
> way for the -Dusemyalloc case, but would love to see some comprehensive
> benchmarks.
At http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/b7c9133ff20009f2?pli=1
were a lot of benchmarks and profiling data.
He is testing string sizes of 1e5-1e7 byte not just pagesizes, piping
a typical pdf to perl.
Plain freebsd had for years the best realloc performance (first with
phkmalloc, now
they switched to jemalloc for better multi-core performance), which were always
faster than simple gnumalloc (ptmalloc 2 or 3 based on Doug Lea's
public domain malloc),
but "gnumalloc" was fast enough. 12ms against 16sec.
Without this patch.
So I don't see any reason to "fix" gnumalloc or freebsd realloc
default behaviour, unless
someone reports problems there. Tuning realloc is black art ( I did it
for Tie::CArray once )
and don't want someone to touch this without any tests.
Only msvcrt is affected and so only platforms which use msvcrt realloc should
be patched, esp. without any tests on the other platforms.
FYI: cygwin (newlib) uses normally freebsd derived libc
implementations, but in this case
cygwin uses not phkmalloc, just ptmalloc2.6.4 i.e. plain gnumalloc.
- which has no public independent_comalloc() which is a shame btw.
-Dusemymalloc is overallocating a lot, but we know that. This is not a
realloc problem per se,
plain malloc does the same.
--
Reini Urban
Thread Previous
|
Thread Next