From:

Date:

January 9, 2004 11:35Subject:

Math::BigFloat/Math::BigRat - memory and speedupMessage ID:

200401092021.38384@bloodgate.com-----BEGIN PGP SIGNED MESSAGE----- Moin, if you are not interested in reading about my work, please stop now. Thank you. Contents: 0. Introduction to Problem 1. Possible Solutions 0. Introduction =============== After finishing[0] BigInt so far, I am slowly walking myself through the task of updating my other modules. However, fairly recently I noticed the Perl Advent Calendar[4] for the first time, and lo and behold, Mark Fowler has BigInt on it! Wow! :) Most interesting to mention is that he showed Devel::Size output for Math::BigInt - something I always wondered: How to get the size of an object in bytes? Devel::Size certainly slipped under my radar.. Now, it tells us that BigInts use much more memory than a simple scalar or integer, and that BigFloats use even _more_ memory. BigRats eat less if they are integers, and more if they are rational numbers: # perl -MDevel::Size=total_size -le 'print total_size(1)' 16 te@null:~> perl -Mbigrat -MDevel::Size=total_size -le 'print total_size(Math::BigInt->new("13"))' 259 # perl -Ilib -Mbigrat -MDevel::Size=total_size -le 'print total_size(Math::BigRat->new("13"))' 589 # perl -Ilib -Mbigrat -MDevel::Size=total_size -le 'print total_size(Math::BigRat->new("3/4"))' 887 # perl -Mbigrat -MDevel::Size=total_size -le 'print total_size(Math::BigFloat->new("3.4"))' 767 Interestingly, a BigFloat as integer eats slightly more: # perl -Mbigrat -MDevel::Size=total_size -le 'print total_size(Math::BigFloat->new("34"))' 782 While I expected BigFloats to use more memory, I didn't expect them to use so much more. The same for BigRats. And Devel::Size doesn't tell me why they eat so much memory. So I wrote Devel::Size::Report (to be found on CPAN or http://bloodgate.com/perl/packages/). It produces nifty reports like this[3]: Size report for '1' (Math::BigInt): Hash 259 bytes (161 bytes overhead) Key 'value' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key 'sign' => Scalar 26 bytes Total: 259 bytes The distribution contains a small perl script called psize, which you can use like this: # psize "Math::BigInt->new(12)" Size report for 'Math::BigInt->new(12)' => '12' (Math::BigInt): Hash 259 bytes (161 bytes overhead) Key 'value' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key 'sign' => Scalar 26 bytes Total: 259 bytes Using this on BigFloat and BigRat shows us the reason for the memory wastage: # psize "Math::BigFloat->new(12.3)" Size report for 'Math::BigFloat->new(12.3)' => '12.3' (Math::BigFloat): Hash 767 bytes (143 bytes overhead) Key '_m' => Hash 299 bytes (185 bytes overhead) Key 'value' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key '_f' => Scalar 16 bytes Key 'sign' => Scalar 26 bytes Key '_e' => Hash 299 bytes (185 bytes overhead) Key 'value' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key '_f' => Scalar 16 bytes Key 'sign' => Scalar 26 bytes Key 'sign' => Scalar 26 bytes Total: 767 bytes # psize "Math::BigRat->new(3/4)" Size report for 'Math::BigRat->new(3/4)' => '3/4' (Math::BigRat): Hash 539 bytes (unknown bytes overhead) Key '_d' => Hash 333 bytes (185 bytes overhead) Key 'value' => Array 106 bytes (70 bytes overhead) Scalar 36 bytes Key '_f' => Scalar 16 bytes Key 'sign' => Scalar 26 bytes Key '_n' => Hash 319 bytes (185 bytes overhead) Key 'value' => Array 92 bytes (56 bytes overhead) Scalar 36 bytes Key '_f' => Scalar 16 bytes Key 'sign' => Scalar 26 bytes Key 'sign' => Scalar 28 bytes Total: 539 bytes (Please ignore the "unknown..." bug in Devel::Size::Report for now) Woa! But now I know where the memory goes! In all the array/hash overhead! On 64 Bit machines the counts would vary - I would like to see a report since I don't have easy access to such a machine. Question: Why is the overhead per hash/per array _so_ big? 1. Possible Solutions ===================== After a ot of thinking and toiling I think I came up with a solution. Both BigRats and BigFloats use BigInts for their "private parts" because that was easiest. However, BigInts are complicated beasts that have a sign and a all the sign-handling etc, whereas in praxis we need unsigned big integers for the parts[1]. One solution could be to create a special package that can only handle unsigned integers, and therefore doesn't need a sign. However, an idea particle struck me - we already have such a package. It's called: Calc! :) There are some problems like: * BigFloat/BigRat need to know what $CALC is, aka the library BigInt uses today for low-level math. And that shouldn't change later on (but right now you couldn't change the library at run-time anyway, because you would end up with objects that were created with let's say Matter.pm and later on you get objects created with Antimatter.pm and as soon as you bring them together in one operation it all goes kaboOM! with a bright flash...) * It needs a total overhaul of a lot of code - lots of work. * It changes the internal structure. Well, except the "sign", the internal parts shouldnt be used anyway (don't look under the hood, I always warned about this). However, the sign-bit would stay, only _m, _e, _d and _n would change and no longer be BigInts but $CALC objects. I am also not so worried about the change, because, afterall, the original BigInt used a completely different inner structure altogether. And most subclasses that might poke in the innards (by accident or bug) are written by me, anyway, so it is my task to fix them. However, there are a lot of benefits: * memory requirements per object drop significantly Here is a simulation of a would-be Math::BigFloat object with the value 123e-2 using Calc (which basically is [ 123 ] to represent 123): # psize "{ sign => '+', _m => [ 123 ], _e => [2], _es => '-' }" Size report for 'HASH(0x815e604)' (HASH): Hash 419 bytes (223 bytes overhead) Key '_es' => Scalar 26 bytes Key '_m' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key '_e' => Array 72 bytes (56 bytes overhead) Scalar 16 bytes Key 'sign' => Scalar 26 bytes Total: 419 bytes That sounds good, right? Thats about 54% of the memory than it takes now. One could save further memory by doing away with the hash and using an array, something Ilya proposed quite a while ago: # psize "[ '+', [ 123 ], [2], '-' ]" Size report for '[ '+', [ 123 ], [2], '-' ]' => 'ARRAY(0x815d694)': Array 296 bytes (100 bytes overhead) Scalar 26 bytes Array 72 bytes (56 bytes overhead) Scalar 16 bytes Array 72 bytes (56 bytes overhead) Scalar 16 bytes Scalar 26 bytes Total: 296 bytes However, saving further 30% is (IMHO) not worth the unmaintainable code-mess that ensures from this. (Mind you, the savings are only that big for small numbers. Bigger numbers need themselves more storage and thus the overhead is smaller) * Since we do no longer use BigInt for the math, we no longer need the flag to mark the private parts as "don't fondle them". This means that we can simplify BigInt a bit. We also save the memory for two hash keys named "_f" altogether (saving above already included). * Most interesting is that the rewrite would also make it faster. Currently each math operation incurs the penalty of going to BigInt, which does sign fiddling etc, just to hand then the actual work to $CALC. With storing the object directly in BigFloat, we can call return ... if $CALC->is_zero($x->{_m}); instead of: return ... if $x->{_m}->is_zero(); That saves us a _a lot_ of overhead, especially if $CALC is Math::BigInt::GMP or something similiar. This in turn would speedup BigFloat and Bigrat quite a bit. In other words, BigFloat and BigRat would no longer be so much slower than BigInt and gain a bit back on their speed. For practical reasons[2] I will try to rewrite BigRat first and then see where this leads me. Any comments, suggestions, hints, etc are of course welcome. Thanx for giving me room to present my ideas, Tels [0] Well, it is never finished. I already received a few bug/coredump reports and v1.69 will be required to mop up... [1] Ignoring the sign for _e (1e-3) in BigFloat for now. But at least the sign there is only '+' or '-', while BigInt also has inf, -inf, and NaN. [2] The codebase of BigRat is much smaller than the one from BigFloat :) [3] The calculation for hashes might be wrong due to not taking into account the key storage, but I am not sure about this. The testsuite of the package also shows some interesting "side-effects" of Devel::Size and I am still not sure if this is a feature, bug in Devel::Size or in Perl) [4] http://perladvent.org/2003/16th - -- Signed on Fri Jan 9 19:39:55 2004 with key 0x93B84C15. Visit my photo gallery at http://bloodgate.com/photos/ PGP key on http://bloodgate.com/tels.asc or per email. "We have problems like this all of the time," Kirk said, trying to reassure me. "Sometimes its really hard to get things burning." -- http://tinyurl.com/qmg5 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux) Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl. iQEVAwUBP/8CyncLPEOTuEwVAQEQvwf9EmAoK2ZYBoSpg4Qyafa4QIo9xvwiOi4i 9aNL0iHNG8Aa11H9uGPG9mJor+rfbgkkAF1sEQsr+SNQS7ujNJZ/RHK86FG0QMhG qaoPch53FXGEULwzcFJmtFIBisVj/j0cm6uo/THzhMzYP5mHlzWAmoY851/QYoli rH0UooUMzZbK9L3+Lrcx9UCq0dXHRFv04gcR9UKSY4wDLMEAg5FWAnNF4uGezcxR W6O175LdjI2+pjTJm2jU5A75nJM/I0p7eWhxJWUq95fOprgbYjgOcV3k5gvbJOZ+ x8axQwtfnjdfI5WuDSpju6z00OW7EHSVMB2L8X2c5L/fq6bgz29aEg== =/+wF -----END PGP SIGNATURE-----Thread Next

**Math::BigFloat/Math::BigRat - memory and speedup**by Tels- Re: Math::BigFloat/Math::BigRat - memory and speedup by Nicholas Clark
- Re: Math::BigFloat/Math::BigRat - memory and speedup by Tels
- Re: Math::BigFloat/Math::BigRat - memory and speedup by Nicholas Clark
- Re: Math::BigFloat/Math::BigRat - memory and speedup by SADAHIRO Tomoyuki

nntp.perl.org: Perl Programming lists via nntp and http.

Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About