develooper Front page | perl.perl5.porters | Postings from April 2010

Re: A new attempt/idea to lower perl memory requirements (significantly?)

Thread Previous | Thread Next
From:
Joshua ben Jore
Date:
April 14, 2010 10:09
Subject:
Re: A new attempt/idea to lower perl memory requirements (significantly?)
Message ID:
l2tdc5c751d1004141009if3a310bfpd7c681dc2929d39d@mail.gmail.com
On Tue, Apr 13, 2010 at 8:32 AM, Risanecek <risanecek@gmail.com> wrote:
> Hi everybody,
>
> perl memory requirements are extraordinary. I use that euphemism,
> because using words like "insane" could stop important people from
> reading any further. ;-)

I'd echo what Steffen Mueller just said. Recently I had a nice time
reducing two in-memory databases down from ordinary perl to C
structures, then back again to packed things visible to my Perl. I
sent a talk proposal in to YAPC::NA to take a little of people's time
up telling them what I did. Instead of that, here's the gist.

There was more than one such struct but this is the simpler one so
it's what I'll sketch:

%geocoding = (
    $zip_code => {
        longitude => $float,
        latitude => $float,
        zip => ...,
        geo_prec => $integer,
    "$city|$state" => {
        ... # same as above
)

Turns out I only needed the 'geo_prec' value during building the
database because there were multiples of varying precision and I only
used this to be sure I chose the most precise. Same thing for the
'zip' key. Turned out not to be used.

    $record => {
        longitude => $float,
        latitude => $float
    }

Compressed the record hash down to an 8 or 16 byte record by just
storing a packed float or double.

    $packed_longitude = pack 'd', $longitude;
    $packed_latitude = pack 'd', $latitude;
    $record => "$packed_longitude$packed_latitude"

Stored all my records in one long string and indexed them by offset:

    $points = "$record$record$record$record";

Accessed them by substr():

    $record = substr $points, $record_offset, 16;

Did the mapping between keys and offsets into $points with JudyHS:

    Judy::HS::Set( $geodata, $zipcode, $record_offset )
    $record_offset = Judy::HS::Get( $geodata, $zipcode )

Performed this same sort of treatment to another structure:

%words = (
    $category => {
        type => 'category',
        ids => [ "1234253423532", ".... ]
    $sic_code => {
        type => 'sic_code',
        ...

Whittled my 400MB processes (with potentially much CoW churn) down to
100MB and the bulk of the space for my database is now just string
bodies which I'm sure aren't getting de-shared by CoW because the data
is contiguous and I've only actually got half a dozen perl strings now
but each one is rather large.

Josh

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About