develooper Front page | perl.perl5.porters | Postings from November 2016

[perl #130084] [PATCH] Increase hash colision ratio and reducehsplit calls

Thread Previous | Thread Next
From:
Atoomic via RT
Date:
November 13, 2016 16:30
Subject:
[perl #130084] [PATCH] Increase hash colision ratio and reducehsplit calls
Message ID:
rt-4.0.24-2229-1479054576-1484.130084-15-0@perl.org
Note that increasing the collision ratio, reduce the amount of memory used for the hash array
but also slightly decrease performance (access, realocation, ...)

As we are now only increasing arrays (via calling hsplit) twice less often than earlier,
we can safely increase (also by 2) the size of the array when reallocating the array hash during
a growth, without introducing a memory bloat.

This is why the 'x4' is used here, to preserve the same dynamic/performance as earlier.

Main idea: reduce memory at a very low performance cost.

Here are some quick&dirty tests with multiple combinations of SHOULD_HSPLIT / HSPLIT,

# -----------------------------------
# set of different hashes
# -----------------------------------
time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1 (1..10000) {  $h{$l1} = {1..$l1}  } print qx{grep RSS /proc/$$/status} '

(blead)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT: 2 x oldsize
VmRSS:   1480072 kB

real  0m19.072s
user  0m18.012s
sys 0m1.060s

(blead+patch)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
VmRSS:   1409380 kB (-5%)

real  0m16.707s
user  0m15.640s
sys 0m1.068s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT: 2 x oldsize
# collisions pushed to the extreme, while preserving hsplit/buckets logic
# note the memory reduction, but noticeable run time increase
VmRSS:   1192512 kB

real  0m45.874s
user  0m44.932s
sys 0m0.937s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 64 x HvMAX(hv) ; HSPLIT: 16 x 2 x oldsize
# same memory improvement, but run time improved compared to previous run, still fare from reference (blead)
VmRSS:   1224808 kB

real  0m27.996s
user  0m27.028s
sys 0m0.964s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x oldsize
# seems ok
VmRSS:   1224560 kB

real  0m18.474s
user  0m17.554s
sys 0m0.918s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x oldsize
# seems ok
VmRSS:   1261132 kB

real  0m17.377s
user  0m16.469s
sys 0m0.906s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
# seems ok
VmRSS:   1296116 kB

real  0m16.645s
user  0m15.733s
sys 0m0.911s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize
# seems ok
VmRSS:   1275800 kB

real  0m17.514s
user  0m16.621s
sys 0m0.892s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
# seems ok
VmRSS:   1243628 kB

real  0m18.536s
user  0m17.547s
sys 0m0.990s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize
# seems ok
VmRSS:   1221084 kB

real  0m21.573s
user  0m20.601s
sys 0m0.972s


# -----------------------------------
# large hash grow & read/write access
# -----------------------------------
> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS /proc/$$/status}'

(blead)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > HvMAX(hv) ; HSPLIT: 2 x oldsize
VmRSS:   1687840 kB

real  0m20.777s
user  0m19.548s
sys 0m1.227s

(blead+patch)# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 2 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
VmRSS:   1556784 kB

real  0m20.468s
user  0m19.412s
sys 0m1.053s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x oldsize
# not ok: too slow
VmRSS:   1458572 kB

real  0m29.747s
user  0m28.704s
sys 0m1.041s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x oldsize
# not ok: too slow
for 1..10000; print qx{grep RSS /proc/$$/status}'
VmRSS:   1687976 kB

real  0m33.095s
user  0m31.984s
sys 0m1.107s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 4 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
VmRSS:   1556784 kB

real  0m24.261s
user  0m23.227s
sys 0m1.033s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize
VmRSS:   1458608 kB

real  0m24.841s
user  0m23.855s
sys 0m0.979s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 8 x HvMAX(hv) ; HSPLIT: 2 x 2 x oldsize
VmRSS:   1458552 kB

real  0m25.109s
user  0m24.133s

# SHOULD_HSPLIT: HvTOTALKEYS(hv) > 16 x HvMAX(hv) ; HSPLIT: 4 x 2 x oldsize
# no ok: too slow
VmRSS:   1458544 kB

real  0m29.900s
user  0m28.867s
sys 0m1.032s


On Sun, 13 Nov 2016 06:03:38 -0800, atoomic@cpan.org wrote:
> This is a bug report for perl from atoomic@cpan.org,
> generated with the help of perlbug 1.40 running under perl 5.25.7.
> 
> 
> -----------------------------------------------------------------
> [Please describe your issue here]
> 
> Increase hash colision ratio and reduce hsplit calls,
> This is a memory (micro) optimization.
> 
> Previously the hash array was resized each time the hash contains
> the same number of elements of its size.
> 
> With this commit, hash collision is increased to 2*SIZE, which
> has two advantages: resize the hash less often, reduce the size
> of the array for hashes.
> 
> As elements are now going to collide a little more, this is
> going to make hash lookup a bit slower.
> 
> Note that when resizing the hash we are still using
> 2*Number_of_elements
> (which is 4*Size_of_Hash)
> 
> Here are some examples showing the memory optimization resulting from
> this
> change.
> Time is not really relevant here in these samples.
> 
> # -------------------------------
> # set of different hash sizes
> # ———————————————
> 
> (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
> (1..10000) {  $h{$l1} = {1..$l1}  } print qx{grep RSS /proc/$$/status}
> '
> VmRSS:   1480072 kB
> 
> real  0m19.072s
> user  0m18.012s
> sys 0m1.060s
> 
> (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
> $l1
> (1..10000) {  $h{$l1} = {1..$l1}  } print qx{grep RSS /proc/$$/status}
> '
> VmRSS:   1409380 kB (-5%)
> 
> real  0m16.707s
> user  0m15.640s
> sys 0m1.068s
> 
> 
> # -------------------------------
> # large hash grow & read/write access
> # -------------------------------
> 
> (blead)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
> 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
> /proc/$$/status}'
> VmRSS:   1687840 kB
> 
> real  0m20.777s
> user  0m19.548s
> sys 0m1.227s
> 
> (blead+patch)> time PERL_HASH_SEED=0 ./perl -e 'my %h; $h{$_} = 1 for
> 1..13107200; $h{$_} = $h{$_ + 1000} for 1..10000; print qx{grep RSS
> /proc/$$/status}'
> VmRSS:   1556784 kB
> 
> real  0m20.468s
> user  0m19.412s
> sys 0m1.053s
> 
> # -------------------------------
> # 100_000 hashes with 2 keys
> # -------------------------------
> 
> (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
> (1..1000_000) {  $h{$l1} = {1..4}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:    340300 kB
> 
> real  0m2.641s
> user  0m2.370s
> sys 0m0.271s
> 
> (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
> $l1
> (1..100_0000) {  $h{$l1} = {1..4}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:    332072 kB
> 
> real  0m2.647s
> user  0m2.405s
> sys 0m0.242s
> 
> # -------------------------------
> # 100_000 hashes with 32 keys
> # -------------------------------
> 
> (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
> (1..100_0000) {  $h{$l1} = {1..64}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:   2202668 kB
> 
> real  0m8.591s
> user  0m7.062s
> sys 0m1.528s
> 
> (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
> $l1
> (1..1000_000) {  $h{$l1} = {1..64}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:   1942756 kB
> 
> real  0m7.755s
> user  0m6.379s
> sys 0m1.377s
> 
> # -------------------------------
> # 100_000 hashes with 32 keys
> # -------------------------------
> 
> (blead)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my $l1
> (1..10000) {  $h{$l1} = {1..1024}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:    326848 kB
> 
> real  0m1.267s
> user  0m1.006s
> sys 0m0.260s
> 
> (blead+patch)> time PERL_HASH_SEED=12345 ./perl -e 'my %h; foreach my
> $l1
> (1..10000) {  $h{$l1} = {1..1024}  } print qx{grep RSS
> /proc/$$/status} '
> VmRSS:    286512 kB
> 
> real  0m1.149s
> user  0m0.919s
> sys 0m0.229s
> 
> [Please do not change anything below this line]
> -----------------------------------------------------------------
> ---
> Flags:
>     category=core
>     severity=low
>     Type=Patch
>     PatchStatus=HasPatch
> ---
> Site configuration information for perl 5.25.7:
> 
> Configured by root at Sun Nov 13 03:43:35 MST 2016.
> 
> Summary of my perl5 (revision 5 version 25 subversion 7)
> configuration:
>   Commit id: b37d3ac68c2a38a42fd9d7fabe9cf5b8c74d4a83
>   Platform:
>     osname=linux
>     osvers=3.10.0-327.36.3.el7.x86_64
>     archname=x86_64-linux
>     uname='linux nico-c7.dev.cpanel.net 3.10.0-327.36.3.el7.x86_64 #1
> smp
> mon oct 24 16:09:20 utc 2016 x86_64 x86_64 x86_64 gnulinux '
>     config_args='-des -Dusedevel'
>     hint=recommended
>     useposix=true
>     d_sigaction=define
>     useithreads=undef
>     usemultiplicity=undef
>     use64bitint=define
>     use64bitall=define
>     uselongdouble=undef
>     usemymalloc=n
>     bincompat5005=undef
>   Compiler:
>     cc='cc'
>     ccflags ='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
> strong
> -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64
> -D_FORTIFY_SOURCE=2'
>     optimize='-O2'
>     cppflags='-fwrapv -fno-strict-aliasing -pipe -fstack-protector-
> strong
> -I/usr/local/include'
>     ccversion=''
>     gccversion='4.8.5 20150623 (Red Hat 4.8.5-4)'
>     gccosandvers=''
>     intsize=4
>     longsize=8
>     ptrsize=8
>     doublesize=8
>     byteorder=12345678
>     doublekind=3
>     d_longlong=define
>     longlongsize=8
>     d_longdbl=define
>     longdblsize=16
>     longdblkind=3
>     ivtype='long'
>     ivsize=8
>     nvtype='double'
>     nvsize=8
>     Off_t='off_t'
>     lseeksize=8
>     alignbytes=8
>     prototype=define
>   Linker and Libraries:
>     ld='cc'
>     ldflags =' -fstack-protector-strong -L/usr/local/lib'
>     libpth=/usr/local/lib /usr/lib /lib/../lib64 /usr/lib/../lib64
> /lib
> /lib64 /usr/lib64 /usr/local/lib64
>     libs=-lpthread -lnsl -lgdbm -ldb -ldl -lm -lcrypt -lutil -lc
> -lgdbm_compat
>     perllibs=-lpthread -lnsl -ldl -lm -lcrypt -lutil -lc
>     libc=libc-2.17.so
>     so=so
>     useshrplib=false
>     libperl=libperl.a
>     gnulibc_version='2.17'
>   Dynamic Linking:
>     dlsrc=dl_dlopen.xs
>     dlext=so
>     d_dlsymun=undef
>     ccdlflags='-Wl,-E'
>     cccdlflags='-fPIC'
>     lddlflags='-shared -O2 -L/usr/local/lib -fstack-protector-strong'
> 
> 
> ---
> @INC for perl 5.25.7:
>     lib
>     /root/.dotfiles/perl-must-have/lib
>     /root/perl5/lib/perl5/
>     /usr/local/lib/perl5/site_perl/5.25.7/x86_64-linux
>     /usr/local/lib/perl5/site_perl/5.25.7
>     /usr/local/lib/perl5/5.25.7/x86_64-linux
>     /usr/local/lib/perl5/5.25.7
> 
> ---
> Environment for perl 5.25.7:
>     HOME=/root
>     LANG=en_US.UTF-8
>     LANGUAGE (unset)
>     LD_LIBRARY_PATH (unset)
>     LOGDIR (unset)
> 
> PATH=/usr/local/cpanel/3rdparty/perl/524/bin:/usr/local/cpanel/3rdparty/perl/522/bin:/usr/local/cpanel/3rdparty/perl/514/bin:/usr/local/cpanel/3rdparty/bin:/root/bin/:/opt/local/bin:/opt/local/sbin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/opt/cpanel/composer/bin:/root/.dotfiles/bin:/root/perl5/bin:/root/.rvm/bin:/root/bin
>     PERL5DB=use Devel::NYTProf
>     PERL5LIB=/root/.dotfiles/perl-must-
> have/lib::/root/perl5/lib/perl5/
>     PERL_BADLANG (unset)
>     PERL_CPANM_OPT=--quiet
>     SHELL=/bin/bash




---
via perlbug:  queue: perl5 status: open
https://rt.perl.org/Ticket/Display.html?id=130084

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About