develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
From:
Wolfram Humann
Date:
August 21, 2010 03:52
Subject:
Re: Patch to make string-append on win32 100 times faster
Message ID:
4C6FAFF7.8050906@arcor.de
-------- Original Message --------
Subject: Re: Patch to make string-append on win32 100 times faster
From: demerphq <demerphq@gmail.com>
To: Ben Morrow <ben@morrow.me.uk>, Wolfram Humann <w.c.humann@arcor.de>
Date: 21.08.2010 11:18
> On 20 August 2010 17:05, demerphq <demerphq@gmail.com> wrote:
>   
>> On 20 August 2010 16:53, demerphq <demerphq@gmail.com> wrote:
>>     
>>> On 16 August 2010 08:19, Jan Dubois <jand@activestate.com> wrote:
>>>       
>>>> On Sun, 15 Aug 2010, Reini Urban wrote:
>>>>         
>>>>> Jan Dubois schrieb:
>>>>>           
>>>>>> On Fri, 30 Jul 2010, Wolfram Humann wrote:
>>>>>> The discussion of this change seemed to have stalled, but I see
>>>>>> +1 votes from David Golden and Marvin Humphrey, with additional
>>>>>> information from Ben Morrow that the patch also helps on FreeBSD
>>>>>> (when building without -Dusemymalloc), with nobody voicing
>>>>>> any disagreement about applying the patch.
>>>>>>             
>>>>> This particular slowdown was only recognized for WIN32 native malloc,
>>>>> but not for other platforms with better malloc libs.
>>>>>           
>>>> Did you read the paragraph you quoted above?  It explicitly claims that
>>>> the slowdown happens on other platforms when using the platform native
>>>> malloc.
>>>>
>>>>         
>>>>> Those platforms are now hurt by using overallocation,
>>>>> i.e. need more memory, e.g. with piping.
>>>>>           
>>>> Could you provide some evidence for this claim?  The only way a
>>>> "better malloc" can prevent this slowdown is by doing some kind
>>>> of overallocation itself.
>>>>         
>>> This is not correct. Mallocs/reallocs that can merge blocks do not
>>> have the performance penalty that this algorithm seeks to work around.
>>> The problem here is that the Win32 realloc always copies, and thus
>>> extending a block a character at a time becomes exponential. With a
>>> realloc that merges blocks and only copies where there is insufficient
>>> contiguous blocks does not have this problem.
>>>       
>> Ill just note that im not arguing against this patch. Just that
>> overallocation is not the only reason that a malloc might not be
>> penalized by this change.
>>
>> One real-world benchmark that people might want to try would be to use
>> a routine like this:
>>
>> sub make_tree {
>>  my ($depth) = shift;
>>  return int rand 100 unless $depth>0;
>>  return [ make_tree($depth-1), make_tree($depth-1) ]
>> }
>>
>> and then use the XS implementation of Data::Dumper to dump the results
>>  of make_tree() for various N.
>>
>> On win32 even modest N will result in the machine essentially hanging.
>> On no other OS that I've tried it on is the slowdown as noticeable.
>> This was traced to the use of realloc in SV_GROW(). This was the
>> analysis that lead to Nicholas' original patch.
>>     
>
> Ben, Wolfram, any chance you can try benchmarking this with and
> without the new patch?
>   
For a patched win32 perl I'll have to wait till Monday. What I can 
compare now is Strawberry (usemymalloc=n) to Cygwin (usemymalloc=y). 
This slightly golfed version

perl -MData::Dumper -E"sub mt{my $d=shift;$d>0?[mt($d-1),mt($d-1)]:int 
rand 100}; say length Dumper mt 18"
34576966

needs approx. 6 seconds to print and 8 seconds to finish on both of 
these. No noticeable difference. With a touch of horror I have to report 
that the printed stringlengths are slightly non-deterministic from run 
to run...???

Wolfram

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About