develooper Front page | perl.perl5.porters | Postings from March 2016

Re: [perl #114576] check if %hash 500x times slower than if keys%hash

Thread Next
From:
Dave Mitchell
Date:
March 28, 2016 09:28
Subject:
Re: [perl #114576] check if %hash 500x times slower than if keys%hash
Message ID:
20160328092810.GL29332@iabyn.com
On Fri, Mar 25, 2016 at 07:15:58PM +0100, demerphq wrote:
> On 25 Mar 2016 14:16, "Dave Mitchell" <davem@iabyn.com> wrote:
> >
> > On Mon, Feb 22, 2016 at 02:13:50PM -0800, l.mai@web.de via RT wrote:
> > > On Fri Jun 14 02:42:11 2013, nicholas wrote:
> > > > On Thu, Jun 13, 2013 at 07:19:51PM -0700, James E Keenan via RT wrote:
> > > > > On Wed Apr 10 11:22:02 2013, demerphq wrote
> ....
> > Nicholas's work, which caches the value of hv_fill, is merged.
> > Yves refers to:
> >
> > > And I have a patch somewhere (i forget the branch name right now) which
> > > rips out hv-fill entirely.
> >
> > I'm not sure what this refers to, but I suspect it hasn't been merged.
> 
> No, it hasn't.
> 
> Fwiw, The idea was to stop exposing hv_fill via scalar (%hash), and let
> people use one of the utility subs in Hash::Util to calculate it on demand,
> thus eliminating any need to cache it, or maintain it in process.
> 
> It is only used to construct the left hand value in "3/8" which might be
> returned for %hash=1..10; scalar %hash, which imo is useful *only* in that
> its truthfulness is equivalent to 0+keys %hash, or to someone attacking the
> hash algorithm, and perhaps to someone learning about how our hash
> implementation works.
> 
> If scalar %hash became the same as 0+keys, then the first use case is
> satisfied, and the other two are better handled these days by the  powerful
> introspection functions in Hash::Util.
> 
> Its worth noting that the current behaviour is an abstraction violation, it
> exposes the internal implementation of our associative arrays, which under
> hash randomization is non deterministic anyway; outside of the boolean case
> the lhs could be any value < rhs.
> 
> If we change it to be the count of keys we get rid of these problems. The
> value becomes deterministic, we simplify some core code, and we remove a
> possible barrier to switching to an alternate associative array
> implementation.
> 
> It's probably worth revisiting this subject. Ill see what I can do to find
> my patch.


I like in principle the idea of simplifying %hash in scalar context.
I wonder if instead of making it return 0+scalar %hash, we return
"${number_of_keys}/${number_of_alloced_buckets}" - i.e. we return
something that still looks like a fraction, but has a numerator that's
now easy to calculate. This maintains a certain level of backwards
compatibility, for code that expects to be able to do

    %hash =~ m{^(\d+)/(\d+)/$}

All we're then doing is lying about how efficiently HEs are being
distributed across buckets.

Conversely, just returning an int rather than a "1/2" fraction means that
a temporary string doesn't have to be created and then numified  just to
satisfy code that's probably just using it in boolean context anyway.

-- 
In England there is a special word which means the last sunshine
of the summer. That word is "spring".

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About