develooper Front page | perl.perl5.porters | Postings from August 2009

perl core mem enhancement - private arenas

Jim Cromie
August 25, 2009 11:42
perl core mem enhancement - private arenas
Message ID:
=head1 perl core memory improvements

=over 1

=item Name:

Jim Cromie

=item Email:

=item Amount Requested:

How much is your project worth? TBD


=head2 Synopsis

Memory allocation enhancements - Private Arenas

Perl is greedy wrt recycling sv-bodies; freed bodies are returned to
an interpreter-global, sv-type specific freelist, where they hang
until resuse or process termination.  Workloads which rotate through
large populations of each body-type could be badly limited; there
would be no memory left.  While this workload profile is arbitrary, so
is the limitation.  More to the point, holding unused memory til
process death, without recourse, is clearly wasteful.

=head2 Benefits to the Perl Community

Long-running server processes are most likely to suffer from wasted
memory, particularly those that use Storable::freeze on any large data
sets, and dont do so in a sub-task.

Some workloads/worksets which exit with ENOMEM may now run to
completion (very case specific).

Generic subsystem refactor and cleanup that improves prospects for
implementing "use less mem" features.  Arenas-as-containers is a
significant possibility, and warrants wider discussion IMO.  Other
specifics are noted below.

=head2 Deliverables, Project Details

Simply put, heres the steps (modulo grouping)

  1   add void* reqid to struct arena_desc
  1b  get_arena(void*reqid), to remember requestors of allocations.
  2   refactor more_bodies() to thread free bodies onto any root.
  2b  clarify sv_type -> reqid mapping in (new|del)_body macros
  2c  add (new|del)_body_private(void* reqid) macros
  3   stub in Perl_release_arenas(void* reqid)

=head3 1st Benefits

S_more_bodies() refactor lets it thread free-bodies anywhere,
decoupling it from the interpreter itself (or more precisely, from
my_perl->Ibody_roots).  We merely adjust the 1st layer of macros which
callers use to reach it. (perl is macro dense;-)

existing (new|del)_<TYPE> users are unchanged, but their arena
allocations are now transparently tracked and can soon be reclaimed.

get-arena/release-arenas gives a balanced api for clients to manage
slabs of memory themselves.  These slabs are at a low-level, and
should be efficient enough for use by XS libraries.  Theres also no
intrinsic limitation to the current 4k blocks, and arenas can be
individually sized.

An XS-lib to extend this into an object substrate seems practical, if
warranted for other reasons.  Could libs use a means to self-allocate
a set of memblocks, assiging title of each to a registered reqid ?

=head3 Privatize Ptr-table-entries

An immediate beneficiary is ptr-tables; which is a "cheap-hash" for
internal functions like interpreter cloning; theyre very transient
entities, so they carry no ref-counting overhead, and refs they might
hold are guaranteed to exist for their lifetime. (XXX document this
currently tacit requirement).  The important CPAN user is

These tables are filled with PTR_TBL_ENT_t items, which are provided
by S_more_bodies(sv_type = PTE_SVSLOT), and which thus automatically
inherit the new mechanics above.

So, with a few more patches:

  - add private pte-freelist pointer to ptr_table_t
  - in ptr_table_*(): replace global list with private one

When ptr_table_free is called, we know that:

  - all slabs allocated for this ptr-table's entries have its reqid.
  - no other users of those ptes exist.
  - we can now choose how to recycle them.
    choice is client's: will app reuse them ?

If client will reuse them immediately, theres no potential gain by
changing everything.  OTOH, if an server app nstores a large
data-structure in response to a rare query, it would be nice if the
underlying memory were released back to the system/lib so that the
next call to get-arena will succeed.

=head3 Module/Usage Related Benefits

The CPAN use case is important enough to warrant a look wrt efficiency

  ~/.cpan/Metadata is 12M (is your frozen stuff bigger ?)
  - blead needs a 2**19 table to freeze it
  - its loaded once by shell
  - shell is used for a while
  - freezing isnt involved (so instructive only, not dispositive)
  - long CPAN shell shutdown suggests release/reclamation delays

On pure (wag) speed grounds, private ptes may not be a win; the
re-threading currently done by ptr-table-clear() WILL be there for the
next Storable serialization, so all subsequent iterations will be
best-case wrt preallocated PTEs.  Note that this also optimizes the
the obvious benchmark tests (repetetive freezing) for the current
code, thus creating a worst-case against which to benchmark the new

But the benchmark is often irrelevant; we rarely need a series of fast
serializations, and any serialization may be large (.cpan/Metadata is
an example), and having megabytes of memory tied up by freed
ptr-table-entries until END-times is a waste.

=head4 (premature) optimization arena pools

If speed parity on benchmarks matters (heh;-) we could then add
freed-arena pools, eliminating the malloc/free overhead resulting from
the changes described above, which would make an iterative-freeze
benchmark more fair, but the real parity (or advantage) lies in
rethreading freed bodies back to the private freelist, then saving the
whole giant table(s), preallocated with all PTEs threaded and ready
for use in filling the (also preallocated) table (tbl->tbl_ary)
the next time a table is needed.

With this, we could have N independent body-roots, each with a bounded
set of bodies chained thru M separately allocated arenas.  However,
this is all rather complicated, esp vs a 2-liner patch to restore use
of the global *allocation macros, while keeping the new api.

=head4 SYSTEM_LOWMEM_PANIC signals

After adding this patchset, even without the private ptr-tables
freelist, we can easily add a routine to free the global-PTEs
on-demand.  This would give us a way to usefully respond to system
low-mem watermark signals etc.  This will only work for pte-freelists,
since theyre known unused as an sv-building-block, and it also
presupposes that no ptr-tables are in scope when signals are safely

=head2 Private Arenas Summary

get_arena() & release_arenas() implement a (memory) slab allocator
with client-centric reservation and reclamation.

Since this is basically XS, its invisible at the perl level, and
orthogonal to whatever scope control constructs can be added to
further leverage these mechanics.  Thats for p5p to discuss, properly
with big input wrt p6l.

=cut Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About