develooper Front page | perl.perl5.porters | Postings from January 2004

Math::BigFloat/Math::BigRat - memory and speedup

Thread Next
From:
Tels
Date:
January 9, 2004 11:35
Subject:
Math::BigFloat/Math::BigRat - memory and speedup
Message ID:
200401092021.38384@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----

Moin,

if you are not interested in reading about my work, please stop now. Thank
you.

Contents:

0. Introduction to Problem
1. Possible Solutions


0. Introduction
===============

After finishing[0] BigInt so far, I am slowly walking myself through the task
of updating my other modules. However, fairly recently I noticed the Perl
Advent Calendar[4] for the first time, and lo and behold, Mark Fowler has
BigInt on it! Wow! :)

Most interesting to mention is that he showed Devel::Size output for
Math::BigInt - something I always wondered: How to get the size of an object
in bytes? Devel::Size certainly slipped under my radar..

Now, it tells us that BigInts use much more memory than a simple scalar or
integer, and that BigFloats use even _more_ memory. BigRats eat less if they
are integers, and more if they are rational numbers:

	# perl -MDevel::Size=total_size -le 'print total_size(1)'
	16
	te@null:~> perl -Mbigrat -MDevel::Size=total_size -le 'print
	total_size(Math::BigInt->new("13"))'
	259
	# perl -Ilib -Mbigrat -MDevel::Size=total_size -le 'print
	total_size(Math::BigRat->new("13"))'
	589
	# perl -Ilib -Mbigrat -MDevel::Size=total_size -le 'print
	total_size(Math::BigRat->new("3/4"))'
	887
	# perl -Mbigrat -MDevel::Size=total_size -le 'print
	total_size(Math::BigFloat->new("3.4"))'
	767

Interestingly, a BigFloat as integer eats slightly more:

	# perl -Mbigrat -MDevel::Size=total_size -le 'print
	total_size(Math::BigFloat->new("34"))'
	782
	
While I expected BigFloats to use more memory, I didn't expect them to use so
much more. The same for BigRats. And Devel::Size doesn't tell me why they eat
so much memory.

So I wrote Devel::Size::Report (to be found on CPAN or
http://bloodgate.com/perl/packages/). It produces nifty reports like this[3]:

	Size report for '1' (Math::BigInt):
	  Hash 259 bytes (161 bytes overhead)
	    Key 'value' => Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Key 'sign' => Scalar 26 bytes
	Total: 259 bytes

The distribution contains a small perl script called psize, which you can use
like this:

	# psize "Math::BigInt->new(12)"
	Size report for 'Math::BigInt->new(12)' => '12' (Math::BigInt):
	  Hash 259 bytes (161 bytes overhead)
	    Key 'value' => Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Key 'sign' => Scalar 26 bytes
	Total: 259 bytes

Using this on BigFloat and BigRat shows us the reason for the memory wastage:

	# psize "Math::BigFloat->new(12.3)"
	Size report for 'Math::BigFloat->new(12.3)' => '12.3' (Math::BigFloat):
	  Hash 767 bytes (143 bytes overhead)
	    Key '_m' => Hash 299 bytes (185 bytes overhead)
	      Key 'value' => Array 72 bytes (56 bytes overhead)
	        Scalar 16 bytes
	      Key '_f' => Scalar 16 bytes
	      Key 'sign' => Scalar 26 bytes
	    Key '_e' => Hash 299 bytes (185 bytes overhead)
	      Key 'value' => Array 72 bytes (56 bytes overhead)
	        Scalar 16 bytes
	      Key '_f' => Scalar 16 bytes
	      Key 'sign' => Scalar 26 bytes
	    Key 'sign' => Scalar 26 bytes
	Total: 767 bytes

	# psize "Math::BigRat->new(3/4)"
	Size report for 'Math::BigRat->new(3/4)' =>  '3/4' (Math::BigRat):
	  Hash 539 bytes (unknown bytes overhead)
	    Key '_d' => Hash 333 bytes (185 bytes overhead)
	      Key 'value' => Array 106 bytes (70 bytes overhead)
	        Scalar 36 bytes
	      Key '_f' => Scalar 16 bytes
	      Key 'sign' => Scalar 26 bytes
	    Key '_n' => Hash 319 bytes (185 bytes overhead)
	      Key 'value' => Array 92 bytes (56 bytes overhead)
	        Scalar 36 bytes
	      Key '_f' => Scalar 16 bytes
	      Key 'sign' => Scalar 26 bytes
	    Key 'sign' => Scalar 28 bytes
	Total: 539 bytes

(Please ignore the "unknown..." bug in Devel::Size::Report for now)

Woa! But now I know where the memory goes! In all the array/hash overhead!

On 64 Bit machines the counts would vary - I would like to see a report since
I don't have easy access to such a machine.

Question: Why is the overhead per hash/per array _so_ big?

1. Possible Solutions
=====================

After a ot of thinking and toiling I think I came up with a solution. Both
BigRats and BigFloats use BigInts for their "private parts" because that was
easiest. However, BigInts are complicated beasts that have a sign and a all
the sign-handling etc, whereas in praxis we need unsigned big integers for
the parts[1].

One solution could be to create a special package that can only handle
unsigned integers, and therefore doesn't need a sign. However, an idea
particle struck me - we already have such a package. It's called: Calc! :)

There are some problems like:

* BigFloat/BigRat need to know what $CALC is, aka the library BigInt uses
today for low-level math. And that shouldn't change later on (but right now
you couldn't change the library at run-time anyway, because you would end up
with objects that were created with let's say Matter.pm and later on you get
objects created with Antimatter.pm and as soon as you bring them together in
one operation it all goes kaboOM! with a bright flash...)
* It needs a total overhaul of a lot of code - lots of work.
* It changes the internal structure. Well, except the "sign", the internal
parts shouldnt be used anyway (don't look under the hood, I always warned
about this). However, the sign-bit would stay, only _m, _e, _d and _n would
change and no longer be BigInts but $CALC objects. I am also not so worried
about the change, because, afterall, the original BigInt used a completely
different inner structure altogether. And most subclasses that might poke in
the innards (by accident or bug) are written by me, anyway, so it is my task
to fix them.

However, there are a lot of benefits:

* memory requirements per object drop significantly

Here is a simulation of a would-be Math::BigFloat object with the value 123e-2
using Calc (which basically is [ 123 ] to represent 123):

	# psize "{ sign => '+', _m => [ 123 ], _e => [2], _es => '-' }"
	Size report for 'HASH(0x815e604)' (HASH):
	  Hash 419 bytes (223 bytes overhead)
	    Key '_es' => Scalar 26 bytes
	    Key '_m' => Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Key '_e' => Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Key 'sign' => Scalar 26 bytes
	Total: 419 bytes

That sounds good, right? Thats about 54% of the memory than it takes now.

One could save further memory by doing away with the hash and using an array,
something Ilya proposed quite a while ago:

	# psize "[ '+', [ 123 ], [2], '-' ]"
	Size report for '[ '+', [ 123 ], [2], '-' ]' =>  'ARRAY(0x815d694)':
	  Array 296 bytes (100 bytes overhead)
	    Scalar 26 bytes
	    Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Array 72 bytes (56 bytes overhead)
	      Scalar 16 bytes
	    Scalar 26 bytes
	Total: 296 bytes

However, saving further 30% is (IMHO) not worth the unmaintainable code-mess
that ensures from this. (Mind you, the savings are only that big for small
numbers. Bigger numbers need themselves more storage and thus the overhead is
smaller)

* Since we do no longer use BigInt for the math, we no longer need the flag to 
mark the private parts as "don't fondle them". This means that we can 
simplify BigInt a bit. We also save the memory for two hash keys named "_f" 
altogether (saving above already included). 

* Most interesting is that the rewrite would also make it faster. Currently 
each math operation incurs the penalty of going to BigInt, which does sign
fiddling etc, just to hand then the actual work to $CALC. With storing the
object directly in BigFloat, we can call

	return ... if $CALC->is_zero($x->{_m});

instead of:

	return ... if $x->{_m}->is_zero();

That saves us a _a lot_ of overhead, especially if $CALC is Math::BigInt::GMP
or something similiar. This in turn would speedup BigFloat and Bigrat quite a
bit.

In other words, BigFloat and BigRat would no longer be so much slower than
BigInt and gain a bit back on their speed.

For practical reasons[2] I will try to rewrite BigRat first and then see where
this leads me.

Any comments, suggestions, hints, etc are of course welcome.

Thanx for giving me room to present my ideas,

Tels

[0] Well, it is never finished. I already received a few bug/coredump reports
and v1.69 will be required to mop up...
[1] Ignoring the sign for _e (1e-3) in BigFloat for now. But at least the sign
there is only '+' or '-', while BigInt also has inf, -inf, and NaN.
[2] The codebase of BigRat is much smaller than the one from BigFloat :)
[3] The calculation for hashes might be wrong due to not taking into account
the key storage, but I am not sure about this. The testsuite of the package
also shows some interesting "side-effects" of Devel::Size and I am still not
sure if this is a feature, bug in Devel::Size or in Perl)
[4] http://perladvent.org/2003/16th

- --
 Signed on Fri Jan  9 19:39:55 2004 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 "We have problems like this all of the time," Kirk said, trying to
 reassure me.  "Sometimes its really hard to get things burning." --
 http://tinyurl.com/qmg5

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2-rc1-SuSE (GNU/Linux)
Comment: When cryptography is outlawed, bayl bhgynjf jvyy unir cevinpl.

iQEVAwUBP/8CyncLPEOTuEwVAQEQvwf9EmAoK2ZYBoSpg4Qyafa4QIo9xvwiOi4i
9aNL0iHNG8Aa11H9uGPG9mJor+rfbgkkAF1sEQsr+SNQS7ujNJZ/RHK86FG0QMhG
qaoPch53FXGEULwzcFJmtFIBisVj/j0cm6uo/THzhMzYP5mHlzWAmoY851/QYoli
rH0UooUMzZbK9L3+Lrcx9UCq0dXHRFv04gcR9UKSY4wDLMEAg5FWAnNF4uGezcxR
W6O175LdjI2+pjTJm2jU5A75nJM/I0p7eWhxJWUq95fOprgbYjgOcV3k5gvbJOZ+
x8axQwtfnjdfI5WuDSpju6z00OW7EHSVMB2L8X2c5L/fq6bgz29aEg==
=/+wF
-----END PGP SIGNATURE-----


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About