Front page | perl.perl5.porters |
Postings from January 2016
a new threaded mem allocator for Win32 perl dilemma
From:
bulk88
Date:
January 7, 2016 20:45
Subject:
a new threaded mem allocator for Win32 perl dilemma
Message ID:
20160107204510.3662.qmail@lists-nntp.develooper.com
Win32 threaded has a special memory allocator, while non-threaded Win32
perl calls plain libc malloc. This special allocator, the first one in
the file, not the unused 2nd one in the later half lives in
http://perl5.git.perl.org/perl.git/blob/HEAD:/win32/vmem.h . The special
allocator allows for perl threads to be destroyed and their memory
cleaned up, even if 3rd party XS or Perl core C leaks memory blocks.
Leaking memory blocks is an intentional optimization in some
permutations. The "PERL_DESTRUCT_LEVEL" env var controls this, but
leaking is supposed to be done when NOT using threads or NOT using
embedded perl, IDK if this is true IRL. There might be some memblocks
that slip through "PERL_DESTRUCT_LEVEL".
In any case, the win32 threaded special allocator uses a linked list to
free all the blocks associated with that perl thread. It is a copy of
the unix code (or vice versa )in
http://perl5.git.perl.org/perl.git/blob/HEAD:/util.c#l348 . The linked
list means there is a 3 pointer big header on each mem alloc. I figured
out a way to decrease that header from 3 pointers big to 1 pointer big
on Win32 in
http://perl5.git.perl.org/perl.git/shortlog/refs/heads/smoke-me/bulk88/w32_new_thded_mem_alloc
. BUT!!!! a paradox effect happened, I increased the memory usage of perl.
BEFORE
perl -MTest::More -MTest::Harness -e"system 'pause'"
Private Bytes 5752KB Working Set 8952KB
AFTER
perl -MTest::More -MTest::Harness -e"system 'pause'"
Private Bytes 6048KB Working Set 9068KB
Private Bytes, all process unique memory pages
Working Set, all pages in physical memory, this number is completely
subject to if the OS decides to mark and sweep pages back to disk or
not, and how the OS decides if to page out
I took 2 snapshots of the memory layout of the perl process for the test
script above.
Heap 1 is the "Windows Process Heap", the default memory pool for
Windows OS DLLs, and MS's libc malloc. Heap 2-4 is private heaps of
system DLLs that have no public API and every Win32 Process has them and
should be ignored. In new alloc, Heap 5 is per-perl-thread pool
(PerlMem_malloc). Heap 6 is PerlMemShared_malloc, Heap 7 is
PerlMemParse_malloc. Notice Heap 7 seems to be unused.
So I dont know what to do now. The improvements are the header is
smaller. 1 less mutex acquire/release per memory operation (splicing the
linked list must be done with a lock). Destroying a perl thread is
faster (not benchmarked), that instead of freeing all the little allocs,
the Heap Manager just frees whole VM pages, not looking at what is in
them. But, if "free all the little allocs" approach rarely releases
memory to the OS but instead just makes it free for use to future
mallocs in the same process, IDK if that is faster.
One improvement is to remove the unused PerlMemParse pool, but that can
be done on the old allocator and the new one equally (but the memory
savings will probably be visible on OS VM level on new allocator since
then one 4KB page will be not used).
Another thing is, maybe "perl -MTest::More -MTest::Harness -e"system
'pause'"" is an incorrect workload. The graph lines of OS memory usage
of old alloc and new alloc might cross if more work and data is added to
the perl process. At dozens of MBs of mem, not a dozen MB, or on Win64
(16 bytes saved per alloc instead of 8 bytes) the cross might occur
sooner. If I find a workload or someone on this ML can suggest a large
memory workload, and hypothetically the new alloc beats old alloc at
only >100 MB of memory in the process, is that an improvement?
The smaller perl header on the alloc might also have a runtime perf
improvement regarding CPU cache, I haven't tested that.
I have a very daring (undocumented MS API, TonyC probably will not
approve) idea to keep the "free to wrong pool" error message, without
putting a 1 pointer big header on the start of each perl mem alloc. I
might try the daring idea and see if I can get the old alloc vs new
alloc to cross sooner in memory usage.
So, how do I continue this idea/patch? I am stuck trying decide if new
alloc is better than old alloc and I dont have any ideas of different
workloads to try that might redeem new alloc.
-
a new threaded mem allocator for Win32 perl dilemma
by bulk88