develooper Front page | perl.perl5.porters | Postings from August 2010

Re: Patch to make string-append on win32 100 times faster

Thread Previous | Thread Next
August 20, 2010 12:14
Re: Patch to make string-append on win32 100 times faster
Message ID:
On 14 August 2010 02:39, Jan Dubois <> wrote:
> On Fri, 30 Jul 2010, Wolfram Humann wrote:
> [...]
>> I would like to propose that perl requests an additional amount of
>> memory from realloc when strings grow by a small amount. After
>> discussion on c.l.p.m. and with the help of ideas from Ilya Zakharevich
>> and Peter J. Holzer my proposal reads like this:
>> (I'm not a regular user of diff and using it wrong might do more harm
>> than good)
>> In sv.c the following section in Perl_sv_grow():
>>      if (newlen > SvLEN(sv)) {        /* need more room? */
>> #ifndef Perl_safesysmalloc_size
>>      newlen = PERL_STRLEN_ROUNDUP(newlen);
>> #endif
>> should be expanded to:
>> #endif
>>      if (newlen > SvLEN(sv)) {        /* need more room? */
>>      size_t minlen  = SvCUR(sv);
>>      minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10;
>>          if (newlen < minlen) newlen = minlen;
>> /*        newlen += (newlen >> 1) + 10; */
>> #ifndef Perl_safesysmalloc_size
>>      newlen = PERL_STRLEN_ROUNDUP(newlen);
>> #endif
> Thank you for your patch!  I've now committed a slightly cleaned up version
> to the Perl repository, and added your email address to the AUTHORS file:
> Beyond formatting changes I did:
> * change size_t to STRLEN for internal consistency
> * removed the commented out code
> * moved the PERL_STRLEN_EXPAND_SHIFT definition to perl.h
> The discussion of this change seemed to have stalled, but I see
> +1 votes from David Golden and Marvin Humphrey, with additional
> information from Ben Morrow that the patch also helps on FreeBSD
> (when building without -Dusemymalloc), with nobody voicing
> any disagreement about applying the patch.
> I've kept the original constants (PERL_STRLEN_EXPAND_SHIFT being
> 2, plus the minimum addition of 10).  While the shift overallocates
> by 25%, this is based on the original length of the buffer, not
> the requested new length. The reported 12.5% overallocation used
> by Python (for arrays, not strings) is based on the new size already,
> not the old.
> The advantage of this algorithm is also that we don't overallocate
> at all if the buffer already grows by more than 25%, so we won't
> "waste" any space if 2 long strings are concatenated, only small
> strings are repeatedly concatenated to an already long one.
> I don't care much either way about the constant 10, but the price
> for it is small: on a 32-bit system a null-string SV already
> occupies 28 bytes (24 + 1*4), so once it grows to 32 bytes,
> the "10" will actually expand it to 36 (24 + 3*4) bytes instead,
> which doesn't matter either way.
> Things that could benefit from further discussion are:
> a) the algorithm in sv_grow() for expanding SVs with SvOOK
>        if (newlen > SvLEN(sv))
>            newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
>   This doesn't make much sense, and is catastrophic when
>   you append a very long string to a short one.  I have not looked
>   into the heritage of this algorithm yet.

I suspect this was Nicholas'es hack to reduce the number of times
realloc is called when appending single characters to a buffer over
and over.

One thing I have a bit of hesitancy about is it seems the benchmarks
here, (emphasis on seems i havent looked very closely unfortunately)
is that they /look/ like they involve concatenating large strings
together. These types of benchmarks will "blow" the merge block
algorithm used in some reallocs (depending on memory pressures, etc on
the system at the time), the original thought was to address the
performance of code like this:

my $str="";
$str.="x" for 1...100000;

If we dont overallocate AND we are using a malloc/realloc that doesnt
do block merges (like Windows worked when I last used Windows) you end
up getting quadratic performance as each concatenation ends up being a
realloc call, which in turn ends up doing a memcopy(). Which means one
ends up doing a ridiculously large amount of data copying.

If i recall correctly Nicholas made a comment along the lines of "best
performance gain for least effort" of any patch he had ever written.

perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About