develooper Front page | perl.perl6.internals | Postings from July 2002

Re: RFC - Hashing PMC's <>

July 23, 2002 02:07
Re: RFC - Hashing PMC's <>
Message ID:
[Crossposted to perl6-language]
Alberto Manuel Brandão Simões <> writes:
> Resuming:
> - Dan proposed an 'id' method returning an INTVAL;
> - Brent Dax proposed to return a string;
> - Nicholas Clark asked to return an unsigned value;
> - Piers didn't like the 'id' name unless globally guaranteed to be
> unique;
> So, I propose another name (hash). Returning a string can be done
> another way. Parrot may supply a string2hash function which should be
> called before returning for PMC which would like to return a string. And
> yes, maybe returning an unsigned value be more correct.
> Now, I ask for PMC programmers to take care implementing this! Notice
> that, for example in arrays, arrays with the same length but different
> elements should return different hash codes (or try). But for the same
> elements MUST return the same hash code.

Um... not necessarily. Bordering on the 'not at all'. Perl 6 will
apparently allow one to have things other than strings as keys to
hashes. If I have:

   $a = [ 1, 2, 3 ];
   $b = [ 1, 2, 3 ];

   %foo{$a} = 'A';
   %foo{$b} = 'B';

Then I want C< (%foo{$a} == 'A') && %foo{$b} == 'B' > to be true.

Hmm... actually, that doesn't require a hashing algorithm which
returns distinct values for $a and $b does it...

Actually there's a whole can of worms here. Assuming I've just run
the above code. I want to be able to do C< $b[0] = 3 > and still be
able to lookup %foo{$b}, which implies that C<hash> should be based
on some invariant of the PMC that's independent of its content. 

But then sometimes you'd *want* hashing to be based on the
content. Hmm. Assuming $b hasn't been modified, how about:

   %foo{*$a} = 'A';
   %foo{*$b} = 'B';

   %foo{*$a}; # B

Sorry if this got a bit rambly; I'm not sure working things out as you
type is necessarily a good idea, but I do think the distinction
between when one hashes the 'thing' itself rather than the 'contents
of the thing' is rather important.

Also, how deep should we go if we decide to have the hash algorithm
work on the contents?

One starts to understand why scheme has C<eqv?>, C<eq?> and C<equal?>.


   "It is a truth universally acknowledged that a language in
    possession of a rich syntax must be in need of a rewrite."
         -- Jane Austen? Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About