On 20 August 2010 19:43, Jan Dubois <jand@activestate.com> wrote: > On Fri, 20 Aug 2010, demerphq wrote: >> On 20 August 2010 16:53, demerphq <demerphq@gmail.com> wrote: >>> On 16 August 2010 08:19, Jan Dubois <jand@activestate.com> wrote: >>>> Could you provide some evidence for this claim? The only way a >>>> "better malloc" can prevent this slowdown is by doing some kind of >>>> overallocation itself. >>> >>> This is not correct. Mallocs/reallocs that can merge blocks do not >>> have the performance penalty that this algorithm seeks to work >>> around. The problem here is that the Win32 realloc always copies, and >>> thus extending a block a character at a time becomes exponential. >>> With a realloc that merges blocks and only copies where there is >>> insufficient contiguous blocks does not have this problem. >> >> Ill just note that im not arguing against this patch. Just that >> overallocation is not the only reason that a malloc might not be >> penalized by this change. > > Indeed, "the only way" was a rather poor choice of words. > > Note though that all the block merge algorithms still depends heavily > on the allocation pattern. If there are no free blocks to merge, then > they still provide poor performance, and that case *will* be helped > by the overallocation during the growth phase. This has been shown > for both FreeBSD and the Linux allocators. IMO this is "just" common sense. If we call realloc/malloc less often then the underlying code will be faster. Given that the perl tradition is to trade memory for speed in most situations this patch makes sense -- and it /also/ provides a vector for us to provide a use less ''memory'; which would control how much we preallocate. >> One real-world benchmark that people might want to try would be to use >> a routine like this: >> >> sub make_tree { >> my ($depth) = shift; >> return int rand 100 unless $depth>0; >> return [ make_tree($depth-1), make_tree($depth-1) ] >> } >> >> and then use the XS implementation of Data::Dumper to dump the results >> of make_tree() for various N. >> >> On win32 even modest N will result in the machine essentially hanging. >> On no other OS that I've tried it on is the slowdown as noticeable. >> This was traced to the use of realloc in SV_GROW(). This was the >> analysis that lead to Nicholas' original patch. > > I don't see how sv_grow() comes into this example at all. Are you talking > about av_expand()? Actually no, im talking about sv_grow(). Dont forget I said to pass the output of make_tree() into the XS version of Data::Dumper, which will stringify the resulting binary tree by extending a buffer essentially character by character (actually token by token, but this example is pretty close to char by char) to add open brackets, integers, and close brackets. It uses sv_grow to keep growing the buffer as it does so. Interestingly on Windows you can often make Data::Dumper go faster by NOT using the XS version by forcing it to use the pure perl version via $Data::Dumper::Useqq or more recent equivalents - im not sure exactly why this makes a difference, I never dug, however Nicholas was able to demonstrate a huge speedup on some platforms with his original hack along these lines.. Basically it turned out that DD was calling realloc() pretty much once per character emitted while dumping. By preallocating an extra 10% the number of realloc calls was reduced dramatically. Also, note that av_expand() is never called in my example as all arrays are preallocated with 8 elements (er, i think - certainly more than 2) and my code uses only 2 per array. The point here is that the code will produce *many* arrays, and will require DD to do a lot of concatenation to do the dump. > That is indeed an area that should probably be scrutinized for potential > Windows performance improvements too. I also have some vague memory that > hash expansion beyond something like 100,000 keys gets extremely slow on > Windows. Sounds interesting, but its not the case I meant, i was strictly referring to the way the XS version of Data::Dumper builds the stringified version of the object being dumped. Cheers, yves -- perl -Mre=debug -e "/just|another|perl|hacker/"Thread Previous | Thread Next