develooper Front page | perl.perl5.porters | Postings from November 2004

sharing hash values was: Re: (no subject)

From:
Tels
Date:
November 25, 2004 10:04
Subject:
sharing hash values was: Re: (no subject)
Message ID:
200411251904.01931@bloodgate.com
-----BEGIN PGP SIGNED MESSAGE-----

Helo Luca,

On Thursday 25 November 2004 17:04, lmangane@tiscali.it wrote:
> Hi Tels,
>
> I think that's less expensive use a string to store
> all the "seen values", and after use a regex to
> look for the occurence of a specific value inside
> that string.
>
> $seen = '#';	# Initialize.
>
> # To set the "seen values":
> for ('a'..'c') { $seen .= $_ . '#'; }
>
> # To look for some values:
> for ('b'..'d') { printf "Seen %s\n", $_ if ($seen =~ /#$_#/); }
>
> # To print all the "seen values":
> $PrevOFS = $,;
> $, = "\n";
> print "-" x 20, "\nAll seen values:", split ('#', $seen), "\n";
> $, = $PrevOFS;
> ...
>
> or better:
>
> # To print all the "seen values":
> $seen =~ s/#/\n/g;
> print "-" x 20, "\nAll seen values:\n", $seen;

There are a few problems with that approach:

* you need to pick a separator (you used '#') which does not occur in any of 
the strings. That can be really tricky for general purpose applications :) 
(Hm, one could use 0x00 :)
* it doesn't scale on lookup, since it uses O(N) to find the string if it 
exists, and O(M) to find out that a string does not exist.  (where N is the 
position of the string, for a sorted list that would be about M * 0.5, and M 
the amount of strings already stored). Hash lookups are much better for 
lage/huge quanitities of strings (basically, that is the point behind the 
hash idea :)

Pros:

* fast "insert", basically a string append. 
* very dense storage - only one byte overhead per item.

It certainly would cut down the storage, but it would likely to be quite 
slower for massive amounts of strings.

However, storing strings in "buckets" (e.g. all 0..3 byte strings together, 
4..7 byte strings together etc) would achive similiar storage densitiy,  
while still allowing fast lookup via a hash.

Thanx for your input,

Tels

- -- 
 Signed on Thu Nov 25 19:03:49 2004 with key 0x93B84C15.
 Visit my photo gallery at http://bloodgate.com/photos/
 PGP key on http://bloodgate.com/tels.asc or per email.

 This email violates EU patent EP0394160:
 
   [ ########## 66% ####	   ]

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.4 (GNU/Linux)

iQEVAwUBQaYekHcLPEOTuEwVAQGswAf/UA9pQ+ACiC4aQIrDE6rT/R08ugy1FZga
7Hbhjkma/KdFczSFrRG4ZcswirZjNRod1m8MmNvuqPOnNfxWDqpg09GwZ7JJDbQ1
g8LNklagn+XGUsMZeo70DOBBKz9+VBWZ+7FaogJefc0H4c0o3HPkcy9YtQ5RNOHl
R3Z7cYy7bH6ORd2ynOIh1Lm2xcueoyg0Z40ZI2GXqGN5C8izC+FIPrTifw/ZNCmp
siHyTXJCVkeEV54mr4xEDNnFwUYzZ02bB00lQ0YcfIaMj6ad9sJmEepszTCbu7F5
v2+lgAAnEOUispuqbn2n58MPxEVhl8rNxJVjhNmDqr81zEiDjqwQMQ==
=JgBC
-----END PGP SIGNATURE-----



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About