develooper Front page | perl.beginners | Postings from August 2009

Re: Inverting a hash safely

Thread Previous
From:
Ed Avis
Date:
August 4, 2009 02:33
Subject:
Re: Inverting a hash safely
Message ID:
loom.20090804T092020-39@post.gmane.org
Jenda Krynicky <Jenda <at> Krynicky.cz> writes:
>
[inverting a hash but checking that no data is lost]

>>To give a really useful error message is a bit more code:
>>
>>    my %reverse;
>>    foreach my $k (sort keys %hash) {
>>        my $v = $hash{$k};
>>        if (exists $reverse{$k}) {
>>            die "cannot reverse: $v is mapped to by both $k and \
>>                 $reverse{$k}\n";
>>        }
>>        $reverse{$k} = $v;
>>    }
> 
>And is this message useful?

To me, yes; YMMV.  It's not so much a user error message as an assertion
failure - the assumption made by the programmer, that the hash was one-to-one
and so could be reversed without losing elements, turns out to be false.

Personally I like to make error messages contain as much information as
possible.  Others prefer them to be more concise, and step through with
a debugger to find the exact values involved.  It's a matter of taste.

>I mean, if you as the user of the program 
>receive this message does it tell you anything useful?

I see what you mean.  Perhaps not, but there are many 'internal errors' and
'assertion failures' and 'invalid conditions' and other completely user-
incomprehensible errors in all applications.  Just because the user can't
understand it doesn't mean it is not a useful check to make.  It is probably
better to fail with an error, no matter how cryptic, than to continue and
silently give the wrong answer.

Now, if there is some user-editable data such as a config file where keys and
values must be unique, it's certainly the job of the code reading that file to
check the uniqueness and give a helpful user error message.  I do not mean to
suggest that by putting safe_hash_invert (or any other error checking function)
in your code you can avoid the responsibility to do proper input validation.
Rather, the input validation is what sets up the invariant (in this case
'the hash is one-to-one') and then the assertions and other checks are firstly
a safety measure in case some bug in the code breaks the invariant or a future
programmer doesn't realize that it is meant to be preserved, and secondly a
form of code-documentation to make it clear what your intentions are.
 
>You can't prevent people from shooting themselves into their feet.

Haha, no, but you can go some of the way.  Most guns do have safety catches
after all.  I'm not saying you can't choose to do it the unsafe way, if you're
sure that is what you want.

>If anything, it would be better to provide a function that'd check 
>whether the values of the hash are unique so that you could
>  assert(values_are_unique \%hash);

That would work too.

For me, the pattern of 'check unique values; now reverse it' is common enough
that I would want a single function for it, even though it's only saving one
line of code.

>Because you should only reverse hashes that by 
>definition have unique values. So this is something you should test, 
>but do not necessarily want to test each and every time in the 
>release build.

True.  If you benchmarked your program and found that the checking was really
making a noticeable difference to performance, by all means switch to the
unsafe but slightly faster version.

>And if you feel like it, create a function that reverses
>
>(a => 1, b => 3, c => 1) => (1 => ['a','c'], 2 => ['b'])

Yes, I often end up writing that kind of code too, so this would be handy.

-- 
Ed Avis <eda@waniasset.com>


Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About