develooper Front page | perl.perl5.porters | Postings from August 2010

RE: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
Jan Dubois
August 20, 2010 10:43
RE: Patch to make string-append on win32 100 times faster
Message ID:
On Fri, 20 Aug 2010, demerphq wrote:
> On 20 August 2010 16:53, demerphq <> wrote:
>> On 16 August 2010 08:19, Jan Dubois <> wrote:
>>> Could you provide some evidence for this claim?  The only way a
>>> "better malloc" can prevent this slowdown is by doing some kind of
>>> overallocation itself.
>> This is not correct. Mallocs/reallocs that can merge blocks do not
>> have the performance penalty that this algorithm seeks to work
>> around. The problem here is that the Win32 realloc always copies, and
>> thus extending a block a character at a time becomes exponential.
>> With a realloc that merges blocks and only copies where there is
>> insufficient contiguous blocks does not have this problem.
> Ill just note that im not arguing against this patch. Just that
> overallocation is not the only reason that a malloc might not be
> penalized by this change.

Indeed, "the only way" was a rather poor choice of words.

Note though that all the block merge algorithms still depends heavily
on the allocation pattern.  If there are no free blocks to merge, then
they still provide poor performance, and that case *will* be helped
by the overallocation during the growth phase.  This has been shown
for both FreeBSD and the Linux allocators.

> One real-world benchmark that people might want to try would be to use
> a routine like this:
> sub make_tree {
>   my ($depth) = shift;
>   return int rand 100 unless $depth>0;
>   return [ make_tree($depth-1), make_tree($depth-1) ]
> }
> and then use the XS implementation of Data::Dumper to dump the results
> of make_tree() for various N.
> On win32 even modest N will result in the machine essentially hanging.
> On no other OS that I've tried it on is the slowdown as noticeable.
> This was traced to the use of realloc in SV_GROW(). This was the
> analysis that lead to Nicholas' original patch.

I don't see how sv_grow() comes into this example at all.  Are you talking
about av_expand()?

That is indeed an area that should probably be scrutinized for potential
Windows performance improvements too.  I also have some vague memory that
hash expansion beyond something like 100,000 keys gets extremely slow on


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About