Front page | perl.perl5.porters |
Postings from October 2013
NWCLARK TPF grant report #103
From:
Nicholas Clark
Date:
October 3, 2013 13:48
Subject:
NWCLARK TPF grant report #103
Message ID:
20131003134814.GK4940@plum.flirble.org
[Hours] [Activity]
2013/08/19 Monday
0.75 process, scalability, mentoring
4.00 reading/responding to list mail
=====
4.75
2013/08/20 Tuesday
0.25 RT #107816
1.00 RT #119155
1.00 RT #2968 (dl_unload, etc) ID 20000402.003
0.50 global destruction
2.50 reading/responding to list mail
2.00 t/re/pat_thr.t
=====
7.25
2013/08/21 Wednesday
1.50 RT #119409
3.25 RT #2968 (dl_unload, etc)
4.50 RT #2968 (dl_unload, etc) ID 20000402.003
=====
9.25
2013/08/22 Thursday
0.25 RT #118987
0.50 RT #118987 (SDBM_File's Makefile)
0.25 RT #119189
0.25 RT #119195
0.25 RT #119351
4.25 RT #2968 (dl_unload, etc)
0.50 lib/perl5db.t
=====
6.25
2013/08/23 Friday
0.50 RT #119437
2.00 de-duping BSD Call Back Units
1.50 dl_dld.xs
1.50 reading/responding to list mail
=====
5.50
2013/08/24 Saturday
0.25 RT #119161
0.25 reading/responding to list mail
=====
0.50
Which I calculate is 33.50 hours
Multiple cores are the future! And we need to enjoy this, because we aren't
going to get any choice about it. The current hardware for the hot backup
for perl5.git.perl.org has 24 cores, and even mobile phones are thinking
about going quad-core.
The upshot of this is that the more that you can get the computer to do
parallel, the faster the entire task will complete. On *nix the make has
been able to run in parallel for quite a while, and back in 2008 I hacked
away on TAP::Harness until we got the tests running in parallel too. (Sadly
only on *nix - Microsoft's divergent implementation of socket() frustrates
the same approach being used on Win32.)
Well, to be strictly accurate, the make has been able to run in parallel
except when it can't, due to race conditions we were previously unaware of.
(Just like the tests, actually.) We chip away at them as we find them, and I
guess that the best we can hope for is "no known problems", and an
increasingly long time since the last "unknown unknown" turned up.
Frustratingly there had been one "known problem" that we hadn't figured out
a way to solve. SDBM_File contains a subdirectory in which is the source
code to libsdbm.a, and the Makefile.PL for SDBM_File ends up generating a
Makefile which has two rules, both of which recurse into ext/SDBM_File/sdbm/
and try to build the same target. Run in parallel, and you can end up with
one process deleting libsdbm.a at just the same time that a second process
is trying to link it, resulting in that second process bailing out with an
error and the build failing. One rule was the default "build my
subdirectory" rule generated by ExtUtils::MakeMaker. The second was a custom
rule generated by the Makefile.PL to say "to build the dependency libsdbm.a,
you need to do this..." and the difficulty was that the rule that we needed
to keep was the second one.
What has eluded all of us for a very long time is that actually it's pretty
easy to stop ExtUtils::MakeMaker generating that default rule - if you know
*how*. So the fix turns out to be quite simple, once you know the secret,
and now we think that this bug is finally fixed, and as no others have
cropped up in the Makefile for a while, it's starting to look like the
Makefile might be bug free and reliable in parallel.
If multiplying is good for cores, then it must be good for BSDs too, as they
seem to be breeding like rabbits*. The problem with this is that each BSD ends
up having a hints file to tell Configure about its peculiarities and foibles,
those hints files keep getting bigger, and when (for example) Bitrig forks off
OpenBSD, then the Bitrig hints file starts out as a *near* copy of OpenBSD's.
(Although, of course, not identical, as inevitably something key will change,
such as the version numbers.)
The upshot of adding support for Bitrig was that now we had 3 hints files
(OpenBSD, MirBSD and Bitrig) each of which contained a 59 line test to
determine if the C library has broken 64 bit functions inherited from OpenBSD.
Triplication is never good, and as the code in question looked to be pretty
much portable and likely to pass on any sane system, I looked into moving it
into Configure, and hence running it everywhere.
Just for a change, this all went strangely smoothly. AIX and HP-UX, both of
which have choked on these sorts of things before, built it just fine, as
did the less problematic Solaris, FreeBSD and Linux. Various smoke-me
smokers reported it clean on Darwin, NetBSD and Cygwin, and BinGOs manually
tested on both 32 and 64 bit Cygwin without problems. So it went "in", and
the code base got (net) 124 lines shorter.
It would be wonderful to parameterise all the BSDs into one master hints
file and a set of options ("oh, it's basically FreeBSD-like, but they use
git for a VCS, the NetBSD ports system, but default to ZFS"), but I suspect
that that's more work than it's worth. Still, it remains that the hints
files for *BSD are already more than twice as long as the Linux hints file,
and sadly I don't think that that ratio is going to improve.
Less conclusive was my investigation into dl_unload_all_files(). This
feature of DynaLoader dates back 13 years, to just before v5.6.0 was
released, and the open ticket is of a similar vintage, "ID 20000402.003" in
the old system (migrated to RT as #2968).
The problem that dl_unload_all_files() is attempting to solve is that in an
embedded environment, the perl interpreter's exit may not always correspond
to the process exit. Worse still, the perl interpreter might be be started a
second time in the same process, and this ought to work. So to make
everything work smoothly, it's important that when embedded, the interpreter
frees up all resources that it was responsible for allocating.
One of the more obscure resources that had got forgotten was shared
libraries that it had loaded, typically by DynaLoader when loading
extensions dynamically. So dl_unload_all_files() was part of an attempt to
have DynaLoader track all the shared objects that it loaded, and free them
all at interpreter exit. Only it didn't quite work out, and despite the
documentation claiming that it was active, it quietly got disabled unless
DynaLoader.xs was compiled with DL_UNLOAD_ALL_AT_EXIT.
The summary is that "it's a mess".
The curious good news is that even building the 13-year-old code with the
current gcc ASAN shows remarkably few problems. The quality of the core's
code has been consistently good for over a decade.
But back to the bad news. Even for the single-threaded case, there are
problems. The implementation of dl_unload_all_files() isn't robust. It uses
perl_atexit() is called after object destruction, but *before* all the
unblessed SVs gets destroyed. The problem is that this is too early for this
use case - anything used later that has a pointer into something dynamically
loaded instead has a pointer into garbage. For example anything that has tie
magic will have the vtable point to static memory (in the sense that it's a
static object allocated by the C compiler). If the tie was set up by XS code
and that memory was in the shared object, that memory isn't there any more,
but the SV still points to where it was. When the full global destruction
happens it will attempt to call the relevant vtable routine to clear up the
tied scalar, follow the pointer into the abyss and then crash (or worse).
The actual problem that highlighted this was the OP free hook in File::Glob,
which again is a pointer from something "owned" by the interpreter into
memory "owned" by the shared object.
It might be possible to fix these global destructions problems by adding a
hook that gets run much much later, after all the SVs are destroyed, pretty
much as the last thing before perl_destruct() returns. However, this won't
fix the multi-threaded case. It's clear that no-one is building with it with
ithreads (or using it with Win32 pseudo-forks), because there is an
explosion of fail if you do. The problem is that DynaLoader is using
perl_atexit() to unmap shared libraries, and that's per *interpreter* exit.
So as soon as an ithread is spawned, its exit triggers the unmapping of all
shared objects, and the next time the parent thread tries to use one,
kaboom!
I think to properly solve the "at interpreter exit" cleanup we'd really need
to change dl_load_file() to internally track what it's been called for
(tracking shared across threads), and (I think) how often, with ownership
being upped on thread clone. At which point, interpreter exit is safe to call
dl_unload_all_files(). But only really late, when the top level parent
interpreter exits. And given that we've lived with this bug for over 13 years
now, it doesn't seem to be a high priority to fix.
Nicholas Clark
* credit Leon Timmermans for that analogy. It feels sadly apt :-(
-
NWCLARK TPF grant report #103
by Nicholas Clark