On Fri, 30 Jul 2010, Wolfram Humann wrote: [...] > I would like to propose that perl requests an additional amount of > memory from realloc when strings grow by a small amount. After > discussion on c.l.p.m. and with the help of ideas from Ilya Zakharevich > and Peter J. Holzer my proposal reads like this: > (I'm not a regular user of diff and using it wrong might do more harm > than good) > > In sv.c the following section in Perl_sv_grow(): > > if (newlen > SvLEN(sv)) { /* need more room? */ > #ifndef Perl_safesysmalloc_size > newlen = PERL_STRLEN_ROUNDUP(newlen); > #endif > > should be expanded to: > > #ifndef PERL_STRLEN_EXPAND_SHIFT > # define PERL_STRLEN_EXPAND_SHIFT 2 > #endif > > if (newlen > SvLEN(sv)) { /* need more room? */ > size_t minlen = SvCUR(sv); > minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10; > if (newlen < minlen) newlen = minlen; > /* newlen += (newlen >> 1) + 10; */ > #ifndef Perl_safesysmalloc_size > newlen = PERL_STRLEN_ROUNDUP(newlen); > #endif Thank you for your patch! I've now committed a slightly cleaned up version to the Perl repository, and added your email address to the AUTHORS file: http://perl5.git.perl.org/perl.git/commitdiff/f12005599f648e675af22dfef1047191e260bc48 http://perl5.git.perl.org/perl.git/commitdiff/1fd8fa480e535be8885c2c1b2e7e95c3aabbdf7f Beyond formatting changes I did: * change size_t to STRLEN for internal consistency * removed the commented out code * moved the PERL_STRLEN_EXPAND_SHIFT definition to perl.h The discussion of this change seemed to have stalled, but I see +1 votes from David Golden and Marvin Humphrey, with additional information from Ben Morrow that the patch also helps on FreeBSD (when building without -Dusemymalloc), with nobody voicing any disagreement about applying the patch. I've kept the original constants (PERL_STRLEN_EXPAND_SHIFT being 2, plus the minimum addition of 10). While the shift overallocates by 25%, this is based on the original length of the buffer, not the requested new length. The reported 12.5% overallocation used by Python (for arrays, not strings) is based on the new size already, not the old. The advantage of this algorithm is also that we don't overallocate at all if the buffer already grows by more than 25%, so we won't "waste" any space if 2 long strings are concatenated, only small strings are repeatedly concatenated to an already long one. I don't care much either way about the constant 10, but the price for it is small: on a 32-bit system a null-string SV already occupies 28 bytes (24 + 1*4), so once it grows to 32 bytes, the "10" will actually expand it to 36 (24 + 3*4) bytes instead, which doesn't matter either way. Things that could benefit from further discussion are: a) the algorithm in sv_grow() for expanding SVs with SvOOK if (newlen > SvLEN(sv)) newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */ This doesn't make much sense, and is catastrophic when you append a very long string to a short one. I have not looked into the heritage of this algorithm yet. b) Should the new over-allocation also be used under -Dusemyalloc. It will provide less benefit there, but also come at a lower cost because the newlen can still fall inside the already allocated bucket size anyways. I don't have any strong opinion if it should be disabled in this situation or not. Cheers, -JanThread Previous | Thread Next