develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
From:
demerphq
Date:
August 20, 2010 08:05
Subject:
Re: Patch to make string-append on win32 100 times faster
Message ID:
AANLkTikcK=yR5djoD2XMQ72cy+mBv3C36g+5j2iK7k7R@mail.gmail.com
On 20 August 2010 16:53, demerphq <demerphq@gmail.com> wrote:
> On 16 August 2010 08:19, Jan Dubois <jand@activestate.com> wrote:
>> On Sun, 15 Aug 2010, Reini Urban wrote:
>>> Jan Dubois schrieb:
>>> > On Fri, 30 Jul 2010, Wolfram Humann wrote:
>>> > The discussion of this change seemed to have stalled, but I see
>>> > +1 votes from David Golden and Marvin Humphrey, with additional
>>> > information from Ben Morrow that the patch also helps on FreeBSD
>>> > (when building without -Dusemymalloc), with nobody voicing
>>> > any disagreement about applying the patch.
>>>
>>> This particular slowdown was only recognized for WIN32 native malloc,
>>> but not for other platforms with better malloc libs.
>>
>> Did you read the paragraph you quoted above?  It explicitly claims that
>> the slowdown happens on other platforms when using the platform native
>> malloc.
>>
>>> Those platforms are now hurt by using overallocation,
>>> i.e. need more memory, e.g. with piping.
>>
>> Could you provide some evidence for this claim?  The only way a
>> "better malloc" can prevent this slowdown is by doing some kind
>> of overallocation itself.
>
> This is not correct. Mallocs/reallocs that can merge blocks do not
> have the performance penalty that this algorithm seeks to work around.
> The problem here is that the Win32 realloc always copies, and thus
> extending a block a character at a time becomes exponential. With a
> realloc that merges blocks and only copies where there is insufficient
> contiguous blocks does not have this problem.

Ill just note that im not arguing against this patch. Just that
overallocation is not the only reason that a malloc might not be
penalized by this change.

One real-world benchmark that people might want to try would be to use
a routine like this:

sub make_tree {
  my ($depth) = shift;
  return int rand 100 unless $depth>0;
  return [ make_tree($depth-1), make_tree($depth-1) ]
}

and then use the XS implementation of Data::Dumper to dump the results
 of make_tree() for various N.

On win32 even modest N will result in the machine essentially hanging.
On no other OS that I've tried it on is the slowdown as noticeable.
This was traced to the use of realloc in SV_GROW(). This was the
analysis that lead to Nicholas' original patch.

cheers,
Yves



-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About