develooper Front page | perl.perl5.porters | Postings from October 2021

non-shared hash keys and large hashes (was Re: Robin Hood Hashingfor the perl core)

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
October 19, 2021 12:27
Subject:
non-shared hash keys and large hashes (was Re: Robin Hood Hashingfor the perl core)
Message ID:
YW65tdfYpaPj69qf@etla.org
On Thu, Sep 23, 2021 at 03:07:14PM +0200, demerphq wrote:
> On Fri, 10 Sept 2021 at 13:48, Nicholas Clark <nick@ccl4.org> wrote:

> > But, this *might* be worth doing for us, if there are lots of hashes with
> > 1 to 4 keys. (version objects seem to be 4 keys, MRO lookup tables are 1
> > key, at worst 2) - "official" size 4 buckets with a load factor of 1.0,
> > meaning actually allocating 7 buckets and  storing up to 4 keys.
> >
> 
> Doesn't sound like a good idea to me. But you know best. :-)

No, I don't know best. In that, for some things I have a good feel for how
far the code can be bent in a new direction (without it breaking), but I
don't have a great feel for which directions it is actually important to
bend in. Or how to measure success.

> > VC12.0 is capable of *targeting* XP and Windows Server 2003
> > (build host requirement is Windows 7/Windows Server 2012)
> >
> 
> We only introduced the flag so we would stop breaking our windows builds.
> IF that isnt a problem anymore then yay.

I hope the explanation in the commit make sense in
https://github.com/Sereal/Sereal/pull/265


Although *that* commit was the first that I wrote, and maybe the text in
our perldelta.pod is clearer:

=item *

The Perl C source code now uses some C99 features, which we have verified are
supported by all compilers we target. This means that Perl's headers now
contain some code that is legal in C99 but not C89.

This may cause problems for some XS modules that unconditionally add
C<-Werror=declaration-after-statement> to their C compiler flags if compiling
with gcc or clang. Earlier versions of Perl support long obsolete compilers
that are strict in rejecting certain C99 features, particularly mixed
declarations and code, and hence it makes sense for XS module authors to audit
that their code does not violate this. However, doing this is now only
possible on these earlier versions of Perl, hence these modules need to be
changed to only add this flag for C<<$] < 5.035005>>.

=cut

> Yes, that is on my todo list to play with. For certain types of hashes IMO
> the PL_strtab is a massive pessimization. We have to do a double store and
> a lot of complex crap and we end up having one hidden hash that scales to
> the size of the set of unique keys in all hashes, which can chew up a lot
> of  memory in the background and can play havok with preload loading of
> data.
> 
> Eg, consider I want to build 10 hashes with 10 million keys each, im
> actually building 11 hashes, one with 100M keys, and ten with 10M. AND if I
> build of them prefork, and then build a totally different hash with similar
> keys, I start COW'ing the hidden hash.
> 
> I have dreamed that we would/could change how hashes work, and only use
> PL_strtab when they are blessed. So when a hash starts of it would be
> unshared. When it gets blessed it gets converted into a shared hash.  Eg,
> maybe introduce a "no shared_keys;" pragma, or something like that.
> 
> The idea of PL_strtab as far as I know  was to dedupe keys used in objects
> where there is a high level of repetition of the keys. When you are build
> large hashes of arbitrary strings, or aggregations in memory with lots of
> breakouts that are unlikely to overlap, or are building hashes prefork, the
> value is really questionable.

Sort of like this?

https://github.com/nwc10/perl5/tree/larger-hashes-without-shared-keys

It turns off shared keys once hashes get (roughly) larger than 42 keys.
(Except for objects and symbol tables)

As I wrote above:

  No, I don't know best. In that, for some things I have a good feel for how
  far the code can be bent in a new direction (without it breaking), but I
  don't have a great feel for which directions it is actually important to
  bend in. Or how to measure success.


in that, it all ought to be faster for the use cases described, but I fail
to reliably benchmark it.


Taking my previous (not great) benchmark that is roughly a "seen" hash:

$ cat ~/test/big-hash-r3-my.pl
#!perl -w
use strict;

my $size = shift;
$size ||= 1e5;

my %hash;

if ($size < 0) {
    $size = -$size;
    keys %hash = $size;
}

# Avoid a lot of scope entries/exits so that we're mostly timing what we care
# about

++$hash{$_ * 3}
    for 1 .. $size;

$hash{$_ * 5} || 0
    for 1 .. $size;

$hash{$_ * 5} || 0
    for 1 .. $size;

$hash{$_ * 5} || 0
    for 1 .. $size;

__END__


For blead, one run:

$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7    ==2083422== Cachegrind, a cache and branch-prediction profiler
==2083422== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2083422== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2083422== Command: /home/nick/Sandpit/snap-v5.35.4-150-geb058c0793/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2083422==
--2083422-- warning: L3 cache found, using its data for the LL simulation.
==2083422== brk segment overflow in thread #1: can't grow to 0x4838000
==2083422== (see section Limitations in user manual)
==2083422== NOTE: further instances of this message will not be shown
==2083422==
==2083422== I   refs:      37,648,207,135
==2083422== I1  misses:            31,488
==2083422== LLi misses:             6,735
==2083422== I1  miss rate:           0.00%
==2083422== LLi miss rate:           0.00%
==2083422==
==2083422== D   refs:      17,888,205,650  (12,036,875,789 rd   + 5,851,329,861 wr)
==2083422== D1  misses:       245,352,635  (   220,149,381 rd   +    25,203,254 wr)
==2083422== LLd misses:       232,768,464  (   209,874,979 rd   +    22,893,485 wr)
==2083422== D1  miss rate:            1.4% (           1.8%     +           0.4%  )
==2083422== LLd miss rate:            1.3% (           1.7%     +           0.4%  )
==2083422==
==2083422== LL refs:          245,384,123  (   220,180,869 rd   +    25,203,254 wr)
==2083422== LL misses:        232,775,199  (   209,881,714 rd   +    22,893,485 wr)
==2083422== LL miss rate:             0.4% (           0.4%     +           0.4%  )
==2083422==
==2083422== Branches:       5,764,880,483  ( 5,252,453,290 cond +   512,427,193 ind)
==2083422== Mispredicts:      426,430,089  (   106,405,461 cond +   320,024,628 ind)
==2083422== Mispred rate:             7.4% (           2.0%     +          62.5%   )


for that branch (almost - before rebasing):

$ valgrind --tool=cachegrind --branch-sim=yes /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 ~/test/big-hash-r3-my.pl 1e7      ==2082396== Cachegrind, a cache and branch-prediction profiler
==2082396== Copyright (C) 2002-2017, and GNU GPL'd, by Nicholas Nethercote et al.
==2082396== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==2082396== Command: /home/nick/Sandpit/snap-v5.35.4-159-g82b0176e75/bin/perl5.35.5 /home/nick/test/big-hash-r3-my.pl 1e7
==2082396==
--2082396-- warning: L3 cache found, using its data for the LL simulation.
==2082396== brk segment overflow in thread #1: can't grow to 0x4832000
==2082396== (see section Limitations in user manual)
==2082396== NOTE: further instances of this message will not be shown
==2082396==
==2082396== I   refs:      36,103,067,458
==2082396== I1  misses:            31,762
==2082396== LLi misses:             6,739
==2082396== I1  miss rate:           0.00%
==2082396== LLi miss rate:           0.00%
==2082396==
==2082396== D   refs:      17,049,471,109  (11,534,685,391 rd   + 5,514,785,718 wr)
==2082396== D1  misses:       205,496,802  (   188,346,327 rd   +    17,150,475 wr)
==2082396== LLd misses:       194,991,458  (   179,112,354 rd   +    15,879,104 wr)
==2082396== D1  miss rate:            1.2% (           1.6%     +           0.3%  )
==2082396== LLd miss rate:            1.1% (           1.6%     +           0.3%  )
==2082396==
==2082396== LL refs:          205,528,564  (   188,378,089 rd   +    17,150,475 wr)
==2082396== LL misses:        194,998,197  (   179,119,093 rd   +    15,879,104 wr)
==2082396== LL miss rate:             0.4% (           0.4%     +           0.3%  )
==2082396==
==2082396== Branches:       5,594,219,347  ( 5,081,890,292 cond +   512,329,055 ind)
==2082396== Mispredicts:      409,023,997  (    89,004,563 cond +   320,019,434 ind)
==2082396== Mispred rate:             7.3% (           1.8%     +          62.5%   )



so it looks sort-of better, but

1) not fantastically
2) I know that something (hash randomisation) moves those numbers about


So, assuming that you have some test cases that more easily benchmark things
like "what memory doesn't stay shared when you fork?" it would be useful if
you could try benchmarking that branch (vs blead) to see if it helps the
use cases that you (or your $work) are interested in.


I'm really not doing very well at benchmarking whether changes are useful.

I can make changes, but I seem to be shooting in the dark, which isn't
really working out.


Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About