On 30.07.2010 16:30, David Golden wrote: > On Fri, Jul 30, 2010 at 9:37 AM, Wolfram Humann<w.c.humann@arcor.de> wrote: > >> I would like to propose that perl requests an additional amount of memory >> from realloc when strings grow by a small amount. >> > +1 -- though I'm not clear whether this win32 specific or whether all > operating systems should work this way. > > >> if (newlen> SvLEN(sv)) { /* need more room? */ >> size_t minlen = SvCUR(sv); >> minlen += (minlen>> PERL_STRLEN_EXPAND_SHIFT) + 10; >> if (newlen< minlen) newlen = minlen; >> > [snip] > > I have a few questions about the specific implementation. Please > don't take them as suggestions that your approach is necessarily wrong > -- but I'd like to understand the logic behind certain choices. > > * Why are you suggesting making the extra allocation proportional to > the length of the string? Why not a multiple of the length of the > increment? Why not a fixed amount? > Take a) the postscript file of 12 MB. Say it has 120k lines of 100 chars each. Take b) a string of 100 chars and you append another 100 chars. Multiply line-length by what? Say 1000 -> max. memory waste 100kB. a) 120 reallocs, memory waste < 1% -- o.k. b) 1 realloc, memory waste 99.9 % -- not acceptable Say 10: case b) still wastes 80% of it's memory but a) has 12000 reallocs -- not acceptable Fixed amounts are even worse. Try any value and see! Geometric progression covers small and big things equally well. Also, the time required for a realloc increases linearly with the amount of memory that needs to be moved to a new location. So it does make sense to realloc less often for big strings. > * Why the "+10"? Why have a extra minimum at all? Why "10" instead > of another number? Given your examples, for a large string the 10 is > a trivial fraction of the total and for a small string, the efficiency > either doesn't matter or the string quickly grows into a large string > anyway. > It's debatable of it's required but if you start with an empty string and add to that char-by-char it does reduce the amount of system calls in the beginning substantially. I don't know how much it matters if you have many very small strings (how much is the perl overhead for strings already)? > * Why do you think PERL_STRLEN_EXPAND_SHIFT should be 2? That's a 25% > increase in the length of the string. (In many programs, I suspect > many strings are appended to only a few times and that seems like > large memory price to pay.) > I compared to Cygwin Perl on the same machine. It's compiled to use the perl internal malloc/realloc. It grows memory even more aggressive than my proposal with a shift of 2. The total memory required for the same program in Cygwin Perl is often 30% to 50% bigger than in my modified perl. Also, if you go from 2 to 4 you reduce *max.* overhead from 25% to 12.5% but you also quadruple the amount of reallocs required. This can add a lot to the runtime of your program. Also note, that the overhead only occurs if you add small amounts each time. No overhead occurs if you increase the size of your original string by 25 % or more in a single append. >> On current Strawberry Perl (5.10 or 5.12): >> a) 2.6 seconds >> b) 18 seconds >> >> Perl 5.12 compiled on win32 with modifications as above: >> a) 0.014 seconds >> b) 0.2 seconds >> > Awesome! What does it look like with a shift of 3 or 4? > > As I mentioned, I put a good amount of benchmark data in my post on c.l.p.m. This also contains memory usage info. If you can, have a look there. If that doesn't answer your question or you want me to crosspost it here, let me know. WolframThread Previous | Thread Next