develooper Front page | perl.perl5.porters | Postings from October 2021

Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)

Thread Previous | Thread Next
From:
Yves Orton via perl5-porters
Date:
October 20, 2021 20:53
Subject:
Re: [External] Re: non-shared hash keys and large hashes (was Re:Robin Hood Hashing for the perl core)
Message ID:
CAH9MCmgOskSA242wcvUAP_ApGrJTgiBmUvk=_PrYAuRP-c-2CQ@mail.gmail.com
On Tue, Oct 19, 2021 at 9:30 PM Nicholas Clark <nick@ccl4.org> wrote:

> On Tue, Oct 19, 2021 at 03:17:48PM +0200, demerphq wrote:
>
>
> Things I didn't spot until it didn't work :-)
>
> > So I would do something along the following lines:
> >
> > my %hash;
> > my $str= "aaaaaaaaaa";
> > $hash{$str++}++ for 1..(1<<23);
> > for (1..4) {
> >    my $pid= fork // die "couldn't fork";
> >    if (!$pid) {
> >       my %copy;
> >       $copy{$_}++ for keys %hash;
>
> needs an exit here. Or _exit.
>
>
So it does. Sorry mate.


> >   } else {
> >      push @pids, $pid;
> >   }
> > }
> > wait($_) for @pids;
>
> waitpid $_, 0
>

Yeah, my bad. I wrote that quickly in between meetings by memory, and tbh I
havent used fork directly in ages. We have a wrapper around it that we use
here at Booking.com that makes it much less easy to blow your foot off. I
should release it, it makes writing multi forking code trivial.


>
>
> Anyway, fairly consistently, blead:
>
> 37.31user 6.49system 0:18.70elapsed 234%CPU (0avgtext+0avgdata
> 2177592maxresident)k
> 0inputs+0outputs (0major+2303622minor)pagefaults 0swaps
>
> 36.72user 6.44system 0:18.44elapsed 234%CPU (0avgtext+0avgdata
> 2177740maxresident)k
> 0inputs+0outputs (0major+2324684minor)pagefaults 0swaps
>
> 36.66user 6.29system 0:18.58elapsed 231%CPU (0avgtext+0avgdata
> 2177668maxresident)k
> 0inputs+0outputs (0major+2327457minor)pagefaults 0swaps
>
>
> the branch:
>
> 26.07user 3.73system 0:13.50elapsed 220%CPU (0avgtext+0avgdata
> 2308668maxresident)k
> 0inputs+0outputs (0major+1650325minor)pagefaults 0swaps
>
> 25.92user 3.99system 0:13.14elapsed 227%CPU (0avgtext+0avgdata
> 2308888maxresident)k
> 0inputs+0outputs (0major+1650798minor)pagefaults 0swaps
>
> 26.77user 3.79system 0:13.83elapsed 220%CPU (0avgtext+0avgdata
> 2308744maxresident)k
> 0inputs+0outputs (0major+1646734minor)pagefaults 0swaps
>

That is a huge win. More than 25% it would seem.


>
>
>
> I struggled to measure memory consumption, particularly shared memory, even
> if I inserted a sleep before the exit in the children so that I had some
> time
> to inspect top.
>

Try installing https://metacpan.org/pod/Linux::Smaps::Tiny and use it to
have each process report its stats before it exists.


>
> but, it does seem to be faster,


Yeah ~25% faster seems pretty conclusive to me. Id say that measurement
alone justifies all your time.

which I assume is because it's spending
> less time doing the "write" part of COW memory pages after fork.
> If this is the cause, why is the "user" time also less?
>

No idea. Maybe the user processes have to block while the system does its
thing so it counts against both? (Does that make sense?)


> "system" time (I assume) is the kernel stuff. Or is it all smushed
> together?
>
> I don't have root access to anything big enough to measure performance
> usefully with perf or similar.
>

I'll see what I can do.

BTW, replying from my work account primarily because the mail never arrived
at my normal account.

cheers,
Yves

-- 
Yves Orton
Principal Developer & Fellow
[image: Booking.com] <https://www.booking.com/>
Making it easier for everyone
to experience the world.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About