Front page | perl.perl5.porters |
Postings from August 2010
Re: Patch to make string-append on win32 100 times faster
August 21, 2010 05:18
Re: Patch to make string-append on win32 100 times faster
Message ID: AANLkTinbMPwL0eFROhKKXAM7Us-UVFZXP9fHnABD2OTa@mail.gmail.com
On 21 August 2010 14:13, demerphq <firstname.lastname@example.org> wrote:
> On 20 August 2010 21:43, Jan Dubois <email@example.com> wrote:
>> On Fri, 20 Aug 2010, demerphq wrote:
>>> On 14 August 2010 02:39, Jan Dubois <firstname.lastname@example.org> wrote:
>>> > Things that could benefit from further discussion are:
>>> > a) the algorithm in sv_grow() for expanding SVs with SvOOK
>>> > if (newlen > SvLEN(sv))
>>> > newlen += 10 * (newlen - SvCUR(sv)); /* avoid copy each time */
>>> > This doesn't make much sense, and is catastrophic when
>>> > you append a very long string to a short one. I have not looked
>>> > into the heritage of this algorithm yet.
>>> I suspect this was Nicholas'es hack to reduce the number of times
>>> realloc is called when appending single characters to a buffer over
>>> and over.
>> This code is only executed when the SvOOK bit is set (which means that
>> some bytes at the beginning of the allocated block have been chopped
>> off). Are you sure that applies to the Data::Dumper::XS case (yes, I'm
>> too lazy^H^H^H^Hbusy to hunt through Dumper.xs right now).
>>> One thing I have a bit of hesitancy about is it seems the benchmarks
>>> here, (emphasis on seems i havent looked very closely unfortunately)
>>> is that they /look/ like they involve concatenating large strings
>>> together. These types of benchmarks will "blow" the merge block
>>> algorithm used in some reallocs (depending on memory pressures, etc on
>>> the system at the time), the original thought was to address the
>>> performance of code like this:
>>> my $str="";
>>> $str.="x" for 1...100000;
>> No the benchmarks in this thread are all about concatenating small strings
>> to large buffers.
>> Just to reiterate it once more: the "new" patch will not kick in *at all*
>> if the required new size is more than (25% + 10 bytes) larger than the
>> old size. It only helps repeated appends of smallish increments.
>>> If we dont overallocate AND we are using a malloc/realloc that doesnt
>>> do block merges (like Windows worked when I last used Windows) you end
>>> up getting quadratic performance as each concatenation ends up being a
>>> realloc call, which in turn ends up doing a memcopy(). Which means one
>>> ends up doing a ridiculously large amount of data copying.
>> Yes, that is exactly the problem this change is supposed to "solve".
>>> If i recall correctly Nicholas made a comment along the lines of "best
>>> performance gain for least effort" of any patch he had ever written.
>> Could you hunt down the original discussion (and commit id) of Nicholas
>> patch? I don't quite understand how we can still get such huge improvements
>> if this is supposedly already fixed.
>> The one patch from Nicholas that I know about is that under -Dusemymalloc
>> he queries the size of the new block and assigns that number to SvLEN
>> instead of the requested size. That prevents additional calls to sv_grow()
>> when the block wouldn't have to move because realloc() already overallocated.
>> But that patch does nothing for other allocators.
> Im pretty sure its this one:
> $ git log -1 1936d2a74fbc35dd0d686
> commit 1936d2a74fbc35dd0d6862386c9988d92bb8e6e0
> Author: Nicholas Clark <email@example.com>
> Date: Wed Jun 1 20:46:02 2005 +0000
> Round up all string length requests to malloc()/realloc() to the next
> multiple of 4/8 bytes [sizeof(size_t)] on the assumption that malloc()
> internally will quantise, and so we're going to use space that
> otherwise would be wasted. Hopefully this will save realloc()ing.
> p4raw-id: //depot/perl@24665
> But i cant find the mail about it. I do recall some of the discussion
> was on IRC, but I also remember a mail. Unfortunately (for some not
> very unfortunate definition of unfortunate :-) Nicholas improves
> performance all the time, so the keywords I keep looking for in my
> mailbox give a rather large number of hits and ive yet to find the
> right one. Perhaps he remembers?
Right after this I tried a different search criteria and while it
didnt find the mail I vaguely remember it does seem somewhat relevent:
Nicholas Clark had a look at way perl allocates memory for scalars. Up
until now the exact size has always been requested, and the allocated
size is recorded. This helps avoid unnecessary trips to the heap when
a string shrinks and then grows yet stays within the size of the
On the assumption that "malloc" implementations silently round up
anyway, because it simplifies the internal bookkeeping, it makes sense
to round up the requested size silently on perl's side anyway, because
then scalars that creep a little bit past the initial requested size
might still remain within the initial requested size, thereby avoiding
a costly "realloc".
Nicholas then set about trying to determine the exact number of bytes
that are in fact allocated for all small allocation values (where
small means between 1 and 256). It turns out that a brute force "round
up to nearest multiple of 4 on a 32-bit CPU" rule wasn't particularly
effective. "malloc" sometimes allocates much more, especially on
FreeBSD. Nicholas wanted to know whether it would be worth the effort
to probe for these sizes at "Configure" time, build a lookup table,
and tweak the already monstrous "PERL_STRLEN_ROUNDUP" macro to use it
for short allocation sizes.
Nicholas simply wanted to minimise the number of calls made to
"sv_grow", without requiring any extra code or tests elsewhere in the
interpreter. Gisle was unconvinced, reasoning that it takes memory
away from "malloc" that "malloc" may be able to deploy more
effectively elsewhere at some further point in time.
Use a lookup table to save memory
perl -Mre=debug -e "/just|another|perl|hacker/"