-------- Original Message -------- Subject: Re: Patch to make string-append on win32 100 times faster From: demerphq <demerphq@gmail.com> To: Ben Morrow <ben@morrow.me.uk>, Wolfram Humann <w.c.humann@arcor.de> Date: 21.08.2010 11:18 > On 20 August 2010 17:05, demerphq <demerphq@gmail.com> wrote: > >> On 20 August 2010 16:53, demerphq <demerphq@gmail.com> wrote: >> >>> On 16 August 2010 08:19, Jan Dubois <jand@activestate.com> wrote: >>> >>>> On Sun, 15 Aug 2010, Reini Urban wrote: >>>> >>>>> Jan Dubois schrieb: >>>>> >>>>>> On Fri, 30 Jul 2010, Wolfram Humann wrote: >>>>>> The discussion of this change seemed to have stalled, but I see >>>>>> +1 votes from David Golden and Marvin Humphrey, with additional >>>>>> information from Ben Morrow that the patch also helps on FreeBSD >>>>>> (when building without -Dusemymalloc), with nobody voicing >>>>>> any disagreement about applying the patch. >>>>>> >>>>> This particular slowdown was only recognized for WIN32 native malloc, >>>>> but not for other platforms with better malloc libs. >>>>> >>>> Did you read the paragraph you quoted above? It explicitly claims that >>>> the slowdown happens on other platforms when using the platform native >>>> malloc. >>>> >>>> >>>>> Those platforms are now hurt by using overallocation, >>>>> i.e. need more memory, e.g. with piping. >>>>> >>>> Could you provide some evidence for this claim? The only way a >>>> "better malloc" can prevent this slowdown is by doing some kind >>>> of overallocation itself. >>>> >>> This is not correct. Mallocs/reallocs that can merge blocks do not >>> have the performance penalty that this algorithm seeks to work around. >>> The problem here is that the Win32 realloc always copies, and thus >>> extending a block a character at a time becomes exponential. With a >>> realloc that merges blocks and only copies where there is insufficient >>> contiguous blocks does not have this problem. >>> >> Ill just note that im not arguing against this patch. Just that >> overallocation is not the only reason that a malloc might not be >> penalized by this change. >> >> One real-world benchmark that people might want to try would be to use >> a routine like this: >> >> sub make_tree { >> my ($depth) = shift; >> return int rand 100 unless $depth>0; >> return [ make_tree($depth-1), make_tree($depth-1) ] >> } >> >> and then use the XS implementation of Data::Dumper to dump the results >> of make_tree() for various N. >> >> On win32 even modest N will result in the machine essentially hanging. >> On no other OS that I've tried it on is the slowdown as noticeable. >> This was traced to the use of realloc in SV_GROW(). This was the >> analysis that lead to Nicholas' original patch. >> > > Ben, Wolfram, any chance you can try benchmarking this with and > without the new patch? > For a patched win32 perl I'll have to wait till Monday. What I can compare now is Strawberry (usemymalloc=n) to Cygwin (usemymalloc=y). This slightly golfed version perl -MData::Dumper -E"sub mt{my $d=shift;$d>0?[mt($d-1),mt($d-1)]:int rand 100}; say length Dumper mt 18" 34576966 needs approx. 6 seconds to print and 8 seconds to finish on both of these. No noticeable difference. With a touch of horror I have to report that the printed stringlengths are slightly non-deterministic from run to run...??? WolframThread Previous | Thread Next