Front page | perl.perl5.porters |
Postings from August 2010
Re: Patch to make string-append on win32 100 times faster
Thread Previous
|
Thread Next
From:
David Golden
Date:
August 16, 2010 06:37
Subject:
Re: Patch to make string-append on win32 100 times faster
Message ID:
AANLkTi=6FoO4jW6xErAQd2w4k1ZyiwJEcWwvg+H+S8bV@mail.gmail.com
I think I'm with Reini on this one. I'd rather keep it Windows-only until a
performance problem is documented elsewhere.
It's easy to enable for other platforms when needed.
I'm open to people doing the research before 5.14, of course. :-)
David
On Aug 16, 2010 7:19 AM, "Reini Urban" <rurban@x-ray.at> wrote:
> 2010/8/16 Jan Dubois <jand@activestate.com>:
>> On Sun, 15 Aug 2010, Reini Urban wrote:
>>> Jan Dubois schrieb:
>>> > On Fri, 30 Jul 2010, Wolfram Humann wrote:
>>> > The discussion of this change seemed to have stalled, but I see
>>> > +1 votes from David Golden and Marvin Humphrey, with additional
>>> > information from Ben Morrow that the patch also helps on FreeBSD
>>> > (when building without -Dusemymalloc), with nobody voicing
>>> > any disagreement about applying the patch.
>>>
>>> This particular slowdown was only recognized for WIN32 native malloc,
>>> but not for other platforms with better malloc libs.
>>
>> Did you read the paragraph you quoted above? It explicitly claims that
>> the slowdown happens on other platforms when using the platform native
>> malloc.
>>
>>> Those platforms are now hurt by using overallocation,
>>> i.e. need more memory, e.g. with piping.
>>
>> Could you provide some evidence for this claim? The only way a
>> "better malloc" can prevent this slowdown is by doing some kind
>> of overallocation itself. Since the algorithm in this patch
>> is not necessarily cumulative with the overallocation by malloc()
>> it is very well possible that the change has no effect at all
>> on systems with a overallocating malloc().
>
> This is just theory. I KNOW that plain gnumalloc and freebsd realloc
> do work fine.
> But now that we have the mess someone can test it. I don't have such
systems
> so I cannot test it.
>
>> For example, assume Perl is appending 100 bytes to a 1000 bytes string.
>> The patch under discussion will make sure that sv_grow() will request
>> 1260 bytes (1000 + 1000>>2 + 10) instead of just 1100 bytes from
realloc().
>>
>> Assume the original 1000 bytes were allocated in a 1024 bytes slab in
>> the allocator, so the 1100 bytes wouldn't fit in anymore, and realloc()
>> will now move this to e.g. a 1536 byte slab. In that case it doesn't
>> make any difference that we now asked for 1260 bytes instead of 1100.
>>
>> Also note that the minimum growth is based on the old buffer size,
>> so appending 300 bytes to the 1000 byte string will only request
>> 1300 bytes, because 1300 is already larger than the 1260 minimum
>> realloc growth.
>>
>> So until proven otherwise I doubt that this patch has any noticeable
>> effect on a "good malloc()". On a "medium malloc()" I would expect
>> it to improve performance somewhat, at a moderate additional memory
>> requirement.
>>
>>> Why was this patch not applied with the appropriate
>>> #if defined(_WIN32) or what is used for MSVC and mingw?
>>
>> Because then it wouldn't be applied to other platforms, like FreeBSD.
>
> Uuh, nobody ever complained about freebsd realloc performance.
> It was always the fastest on the planet and still is.
>
>> Note thought that I added this remark for discussion about disabling it
>> under -Dusemyalloc:
>>
>> | b) Should the new over-allocation also be used under -Dusemyalloc. It
>> | will provide less benefit there, but also come at a lower cost
because
>> | the newlen can still fall inside the already allocated bucket size
>> | anyways. I don't have any strong opinion if it should be disabled
>> | in this situation or not.
>>
>> But as I stated above, I don't think it will make much of a difference
either
>> way for the -Dusemyalloc case, but would love to see some comprehensive
>> benchmarks.
>
> At
http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/b7c9133ff20009f2?pli=1
> were a lot of benchmarks and profiling data.
> He is testing string sizes of 1e5-1e7 byte not just pagesizes, piping
> a typical pdf to perl.
>
> Plain freebsd had for years the best realloc performance (first with
> phkmalloc, now
> they switched to jemalloc for better multi-core performance), which were
always
> faster than simple gnumalloc (ptmalloc 2 or 3 based on Doug Lea's
> public domain malloc),
> but "gnumalloc" was fast enough. 12ms against 16sec.
>
> Without this patch.
>
> So I don't see any reason to "fix" gnumalloc or freebsd realloc
> default behaviour, unless
> someone reports problems there. Tuning realloc is black art ( I did it
> for Tie::CArray once )
> and don't want someone to touch this without any tests.
>
> Only msvcrt is affected and so only platforms which use msvcrt realloc
should
> be patched, esp. without any tests on the other platforms.
>
> FYI: cygwin (newlib) uses normally freebsd derived libc
> implementations, but in this case
> cygwin uses not phkmalloc, just ptmalloc2.6.4 i.e. plain gnumalloc.
> - which has no public independent_comalloc() which is a shame btw.
>
> -Dusemymalloc is overallocating a lot, but we know that. This is not a
> realloc problem per se,
> plain malloc does the same.
> --
> Reini Urban
Thread Previous
|
Thread Next