develooper Front page | perl.perl5.porters | Postings from August 2010

RE: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
Jan Dubois
August 20, 2010 12:43
RE: Patch to make string-append on win32 100 times faster
Message ID:
On Fri, 20 Aug 2010, demerphq wrote:
> On 14 August 2010 02:39, Jan Dubois <> wrote:
> > Things that could benefit from further discussion are:
> >
> > a) the algorithm in sv_grow() for expanding SVs with SvOOK
> >
> >        if (newlen > SvLEN(sv))
> >            newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
> >
> >   This doesn't make much sense, and is catastrophic when
> >   you append a very long string to a short one.  I have not looked
> >   into the heritage of this algorithm yet.
> I suspect this was Nicholas'es hack to reduce the number of times
> realloc is called when appending single characters to a buffer over
> and over.

This code is only executed when the SvOOK bit is set (which means that
some bytes at the beginning of the allocated block have been chopped
off).  Are you sure that applies to the Data::Dumper::XS case (yes, I'm
too lazy^H^H^H^Hbusy to hunt through Dumper.xs right now).
> One thing I have a bit of hesitancy about is it seems the benchmarks
> here, (emphasis on seems i havent looked very closely unfortunately)
> is that they /look/ like they involve concatenating large strings
> together. These types of benchmarks will "blow" the merge block
> algorithm used in some reallocs (depending on memory pressures, etc on
> the system at the time), the original thought was to address the
> performance of code like this:
> my $str="";
> $str.="x" for 1...100000;

No the benchmarks in this thread are all about concatenating small strings
to large buffers.

Just to reiterate it once more: the "new" patch will not kick in *at all*
if the required new size is more than (25% + 10 bytes) larger than the
old size.  It only helps repeated appends of smallish increments.

> If we dont overallocate AND we are using a malloc/realloc that doesnt
> do block merges (like Windows worked when I last used Windows) you end
> up getting quadratic performance as each concatenation ends up being a
> realloc call, which in turn ends up doing a memcopy(). Which means one
> ends up doing a ridiculously large amount of data copying.

Yes, that is exactly the problem this change is supposed to "solve".
> If i recall correctly Nicholas made a comment along the lines of "best
> performance gain for least effort" of any patch he had ever written.

Could you hunt down the original discussion (and commit id) of Nicholas
patch?  I don't quite understand how we can still get such huge improvements
if this is supposedly already fixed.

The one patch from Nicholas that I know about is that under -Dusemymalloc
he queries the size of the new block and assigns that number to SvLEN
instead of the requested size.  That prevents additional calls to sv_grow()
when the block wouldn't have to move because realloc() already overallocated.

But that patch does nothing for other allocators.


Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About