[sent first without being subscribed; didn't see message appearing in any of the archvies; subscribed; resend now; hope I don't double-post...] My first post to p5p -- let me know if I should have gone about this differently. This discussion started on c.l.p.m., subject "Speed of reading some MB of data using qx(...)", see http://groups.google.com/group/comp.lang.perl.misc/browse_thread/thread/b7c9133ff20009f2. I'll try to summarize my current understanding of matters: When a string grows (e.g. gets appended to), perl calls sv_grow. When sv_grow decides that the memory currently allocated to the string is insufficient, it calls saferealloc. Depending on whether or not perl was compiled with 'usemymalloc' this calls realloc in either perls internal version or on the operating system. Perl requests from realloc just the amount of memory required for the current operation. With 'usemymalloc' this is not a problem because it rounds up memory allocation to a certain geometric progression anyway. When the operating system's realloc is called, this may or may not lead to desaster, depending on how it's implemented. On win32 it does lead to desaster: when I loop 1000 times and each time append 1000 chars to an initial string size of 10 million, the memory grows from 10.000e6 to 10.001e6 to 10.002e6 and so on 1000 times till it ends at 11.000e6. I would like to propose that perl requests an additional amount of memory from realloc when strings grow by a small amount. After discussion on c.l.p.m. and with the help of ideas from Ilya Zakharevich and Peter J. Holzer my proposal reads like this: (I'm not a regular user of diff and using it wrong might do more harm than good) In sv.c the following section in Perl_sv_grow(): if (newlen > SvLEN(sv)) { /* need more room? */ #ifndef Perl_safesysmalloc_size newlen = PERL_STRLEN_ROUNDUP(newlen); #endif should be expanded to: #ifndef PERL_STRLEN_EXPAND_SHIFT # define PERL_STRLEN_EXPAND_SHIFT 2 #endif if (newlen > SvLEN(sv)) { /* need more room? */ size_t minlen = SvCUR(sv); minlen += (minlen >> PERL_STRLEN_EXPAND_SHIFT) + 10; if (newlen < minlen) newlen = minlen; /* newlen += (newlen >> 1) + 10; */ #ifndef Perl_safesysmalloc_size newlen = PERL_STRLEN_ROUNDUP(newlen); #endif I've posted a benchmark with results on c.l.p.m.. Therefore, two examples may suffice here: a) Starting from an initial string of length 1e7 chars, append 1e6 chars in 10000 chunks of 100 chars each. b) Read long, multilined output from another process, simulated here by reading a 12 MB postscript file using $ps = qx(cat postscriptfile.ps) On current Strawberry Perl (5.10 or 5.12): a) 2.6 seconds b) 18 seconds Perl 5.12 compiled on win32 with modifications as above: a) 0.014 seconds b) 0.2 seconds The value of PERL_STRLEN_EXPAND_SHIFT determines the balance between extra memory use and speed (i.e. frequency of calls to realloc). "2" seems to me like a decent compromise. WolframThread Previous | Thread Next