develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
Wolfram Humann
August 21, 2010 03:52
Re: Patch to make string-append on win32 100 times faster
Message ID:
-------- Original Message --------
Subject: Re: Patch to make string-append on win32 100 times faster
From: demerphq <>
To: Ben Morrow <>, Wolfram Humann <>
Date: 21.08.2010 11:18
> On 20 August 2010 17:05, demerphq <> wrote:
>> On 20 August 2010 16:53, demerphq <> wrote:
>>> On 16 August 2010 08:19, Jan Dubois <> wrote:
>>>> On Sun, 15 Aug 2010, Reini Urban wrote:
>>>>> Jan Dubois schrieb:
>>>>>> On Fri, 30 Jul 2010, Wolfram Humann wrote:
>>>>>> The discussion of this change seemed to have stalled, but I see
>>>>>> +1 votes from David Golden and Marvin Humphrey, with additional
>>>>>> information from Ben Morrow that the patch also helps on FreeBSD
>>>>>> (when building without -Dusemymalloc), with nobody voicing
>>>>>> any disagreement about applying the patch.
>>>>> This particular slowdown was only recognized for WIN32 native malloc,
>>>>> but not for other platforms with better malloc libs.
>>>> Did you read the paragraph you quoted above?  It explicitly claims that
>>>> the slowdown happens on other platforms when using the platform native
>>>> malloc.
>>>>> Those platforms are now hurt by using overallocation,
>>>>> i.e. need more memory, e.g. with piping.
>>>> Could you provide some evidence for this claim?  The only way a
>>>> "better malloc" can prevent this slowdown is by doing some kind
>>>> of overallocation itself.
>>> This is not correct. Mallocs/reallocs that can merge blocks do not
>>> have the performance penalty that this algorithm seeks to work around.
>>> The problem here is that the Win32 realloc always copies, and thus
>>> extending a block a character at a time becomes exponential. With a
>>> realloc that merges blocks and only copies where there is insufficient
>>> contiguous blocks does not have this problem.
>> Ill just note that im not arguing against this patch. Just that
>> overallocation is not the only reason that a malloc might not be
>> penalized by this change.
>> One real-world benchmark that people might want to try would be to use
>> a routine like this:
>> sub make_tree {
>>  my ($depth) = shift;
>>  return int rand 100 unless $depth>0;
>>  return [ make_tree($depth-1), make_tree($depth-1) ]
>> }
>> and then use the XS implementation of Data::Dumper to dump the results
>>  of make_tree() for various N.
>> On win32 even modest N will result in the machine essentially hanging.
>> On no other OS that I've tried it on is the slowdown as noticeable.
>> This was traced to the use of realloc in SV_GROW(). This was the
>> analysis that lead to Nicholas' original patch.
> Ben, Wolfram, any chance you can try benchmarking this with and
> without the new patch?
For a patched win32 perl I'll have to wait till Monday. What I can 
compare now is Strawberry (usemymalloc=n) to Cygwin (usemymalloc=y). 
This slightly golfed version

perl -MData::Dumper -E"sub mt{my $d=shift;$d>0?[mt($d-1),mt($d-1)]:int 
rand 100}; say length Dumper mt 18"

needs approx. 6 seconds to print and 8 seconds to finish on both of 
these. No noticeable difference. With a touch of horror I have to report 
that the printed stringlengths are slightly non-deterministic from run 
to run...???


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About