develooper Front page | perl.perl5.porters | Postings from January 2016

a new threaded mem allocator for Win32 perl dilemma

From:
bulk88
Date:
January 7, 2016 20:45
Subject:
a new threaded mem allocator for Win32 perl dilemma
Message ID:
20160107204510.3662.qmail@lists-nntp.develooper.com
Win32 threaded has a special memory allocator, while non-threaded Win32 
perl calls plain libc malloc. This special allocator, the first one in 
the file, not the unused 2nd one in the later half lives in 
http://perl5.git.perl.org/perl.git/blob/HEAD:/win32/vmem.h . The special 
allocator allows for perl threads to be destroyed and their memory 
cleaned up, even if 3rd party XS or Perl core C leaks memory blocks. 
Leaking memory blocks is an intentional optimization in some 
permutations. The "PERL_DESTRUCT_LEVEL" env var controls this, but 
leaking is supposed to be done when NOT using threads or NOT using 
embedded perl, IDK if this is true IRL. There might be some memblocks 
that slip through "PERL_DESTRUCT_LEVEL".

In any case, the win32 threaded special allocator uses a linked list to 
free all the blocks associated with that perl thread. It is a copy of 
the unix code (or vice versa )in 
http://perl5.git.perl.org/perl.git/blob/HEAD:/util.c#l348 . The linked 
list means there is a 3 pointer big header on each mem alloc. I figured 
out a way to decrease that header from 3 pointers big to 1 pointer big 
on Win32 in 
http://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/bulk88/w32_new_thded_mem_alloc 
. BUT!!!! a paradox effect happened, I increased the memory usage of perl.

BEFORE

perl -MTest::More -MTest::Harness -e"system 'pause'"
Private Bytes 5752KB Working Set 8952KB

AFTER

perl -MTest::More -MTest::Harness -e"system 'pause'"
Private Bytes 6048KB Working Set 9068KB

Private Bytes, all process unique memory pages

Working Set, all pages in physical memory, this number is completely 
subject to if the OS decides to mark and sweep pages back to disk or 
not, and how the OS decides if to page out

I took 2 snapshots of the memory layout of the perl process for the test 
script above.

Heap 1 is the "Windows Process Heap", the default memory pool for 
Windows OS DLLs, and MS's libc malloc. Heap 2-4 is private heaps of 
system DLLs that have no public API and every Win32 Process has them and 
should be ignored. In new alloc, Heap 5 is per-perl-thread pool 
(PerlMem_malloc). Heap 6 is PerlMemShared_malloc, Heap 7 is 
PerlMemParse_malloc. Notice Heap 7 seems to be unused.

So I dont know what to do now. The improvements are the header is 
smaller. 1 less mutex acquire/release per memory operation (splicing the 
linked list must be done with a lock). Destroying a perl thread is 
faster (not benchmarked), that instead of freeing all the little allocs, 
the Heap Manager just frees whole VM pages, not looking at what is in 
them. But, if "free all the little allocs" approach rarely releases 
memory to the OS but instead just makes it free for use to future 
mallocs in the same process, IDK if that is faster.

One improvement is to remove the unused PerlMemParse pool, but that can 
be done on the old allocator and the new one equally (but the memory 
savings will probably be visible on OS VM level on new allocator since 
then one 4KB page will be not used).

Another thing is, maybe "perl -MTest::More -MTest::Harness -e"system 
'pause'"" is an incorrect workload. The graph lines of OS memory usage 
of old alloc and new alloc might cross if more work and data is added to 
the perl process. At dozens of MBs of mem, not a dozen MB, or on Win64 
(16 bytes saved per alloc instead of 8 bytes) the cross might occur 
sooner. If I find a workload or someone on this ML can suggest a large 
memory workload, and hypothetically the new alloc beats old alloc at 
only >100 MB of memory in the process, is that an improvement?

The smaller perl header on the alloc might also have a runtime perf 
improvement regarding CPU cache, I haven't tested that.

I have a very daring (undocumented MS API, TonyC probably will not 
approve) idea to keep the "free to wrong pool" error message, without 
putting a 1 pointer big header on the start of each perl mem alloc. I 
might try the daring idea and see if I can get the old alloc vs new 
alloc to cross sooner in memory usage.

So, how do I continue this idea/patch? I am stuck trying decide if new 
alloc is better than old alloc and I dont have any ideas of different 
workloads to try that might redeem new alloc.



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About