develooper Front page | perl.perl5.porters | Postings from June 2016

Change return signature of scalar(%hash) to match 0+keys(%hash)

Thread Next
June 21, 2016 19:36
Change return signature of scalar(%hash) to match 0+keys(%hash)
Message ID:
This is a continuation  of the thread  "[perl #114576] check if %hash
500x times slower than if keys %hash".

In that thread, which dealt with the performance implications of
scalar(%hash) it was proposed we change the signature to something

I have now pushed a patch to a smoke-me branch which does this. See
28ecb97 for the patch, smoke-me/no_xhv_fill is the branch name.

For those unaware of the details, historically scalar(%hash) returns
either 0, for an empty hash, or a string ratio of the number of used
buckets and total buckets in the hash.

So for instance a fresh hash with one key should return "1/8".

This return has never been very useful, it leaks implementation
details of our associative arrays, and as such it is a blocker for
changing the internals. It is generally surprising to new programmers,
and under hash randomisation the value is non-deterministic. It also
has no intrinsic value anymore, we do not use the "used bucket count"
to control the size of the hash at all internally, and we have to do
bookkeeping to maintain it.

There are a few applications where we might want to get access to the
data that this api provides, learning about the hash internals is one
use case, another is testing our hash implementation in the perl core
tests. Obviously both of these purposes can be served by alternate
API's without impacting production code.

So this patch changes the return value to be the count of the keys in
the hash. This preserves its truthiness, while at the same time making
the return fast and something we already track and use internally in
the hash algorithm.

It also introduces Hash::Util::bucket_ratio(),
Hash::Util::used_buckets(), and Hash::Util::num_buckets() to provide
for backwards compatibility. To avoid adding an XS dependency to our
core tests these functions are in universal.c, and thus always
available, however if someone builds Hash::Util on an older perl the
functions will be available if Hash::Util is loaded. Thus "good" code
would say "use Hash::Util;" but our internal test code does not need

There is one very subtle difference between bucket_ratio() and the old
scalar() function, which I believe is arguably a bug in the old code.
At some point someone changed scalar() to use HvTOTALKEYS() and not
HvUSEDKEYS(). This meant that if you created a locked hash with
placeholders in it, but which was empty, that the count would
/include/ the placeholders. This meant that you could get a true
result from scalar() when you would get a false result from 0+keys

perl -MHash::Util=lock_keys_plus,hidden_keys -le'my %h;
lock_keys_plus(%h,"foo"); print for 0+keys %h, scalar(%h),

whereas with this patch all three of scalar(%hash),
bucket_ratio(%hash), and 0+keys(%hash) will match for such a hash:

./perl -Ilib -MHash::Util=lock_keys_plus,hidden_keys,bucket_ratio
-le'my %h; lock_keys_plus(%h,"foo"); print for 0+keys %h,

I toyed with the idea of having scalar(%h) returning HvTOTALKEYS()
instead of HvUSEDKEYS(), but it seemed of limited usefulness, and
likely to cause confusion. We already have hidden_keys() as you can
see above, and if necessary we can expose HvTOTALKEYS() like I have
done with used_buckets(), num_buckets().

Anyway, there are a couple of other patches in the smoke-me branch
which I will address in another post, for this thread please focus on

I think the only thing I missed out in the patch is possible
references in the pod to the old return signature.


perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About