develooper Front page | perl.perl5.porters | Postings from July 2013

NWCLARK TPF grant report #91

Nicholas Clark
July 26, 2013 13:53
NWCLARK TPF grant report #91
Message ID:
[Hours]		[Activity]
2013/05/27	Monday
 3.50		HvFILL
 1.00		RT #118197
 0.50		perl5db

2013/05/28	Tuesday
 5.75		RT #118139 (Storable)
 0.25		RT #118187
 0.25		RT #118225
 0.25		RT #63402
 2.00		process, scalability, mentoring
 1.25		reading/responding to list mail

2013/05/29	Wednesday
 0.50		HvFILL
 0.75		RT #118139 (Storable)
 5.00		Storable
 0.50		process, scalability, mentoring

2013/06/02	Sunday
 3.25		investigating security tickets

Which I calculate is 27.50 hours

One bug of note is RT #118139. Reini Urban figured out that Storable can
crash when used in DESTROY blocks which are being run during global
destruction, because it is accessing an already freed PL_modglobal or the
internal ctx.

I feel guilty about this, because I believe that it's the same problem that
I first hit in 2003 (while writing slides for the German Perl Workshop). I
saw that the C stacktrace of the SEGV meant that it was during global
destruction, and concluded roughly "OMG global destruction" and "I don't
have time to figure this out *and* write my slides", and so, strangely
enough elected to deal with my slides. And there it languished.

The problem with global destruction is that it has to destroy everything.
What's the problem with that? In a word, cycles. If objects A and B each
reference each other, then (obviously) B needs to stick around longer than
A, in case A's DESTROY needs to follow the reference to B for some reason.
But A needs to stick around longer than B, in case B's DESTROY...  "You go
first". "No, you go first". Global destruction cuts the Gordian knot by
destroying things in any order it feels like. The upshot is that anything
you needed might already be gone, so you code gets to live in interesting
times. So interesting that it may not actually be able to work. This, I had
assumed, was quite the difficulty with Storable - that a solution might not
exist - so I didn't investigate further.

Unlike me back then, Reini Urban hadn't given up so easily, and distilled it
to an excellent clear test case, and suggested a patch to avoid the SEGV, by
causing Storable to die if used during global destruction. (An exception
which quite possibly wouldn't be seen, but that's considerably better than a

However, it gave Leon Timmermans an insight. Why is Storable using DESTROY?
It's not actually running any Perl code. It's just using a blessed reference
as a hook to ensure free() is called on some memory allocations in its
per-interpreter context. The problem is that during global destruction
references to that memory *are* still visible to Storable routines called
from Perl code, use after free bug.

So he figured that we can ensure the same cleanup is run by attaching magic
to the context scalar instead of blessing it. The mg_free hook is called
when all scalars are finally freed. This is much later during global
destruction, after all destructors have been called and no more Perl code is
running. Hence this ensures the desired ordering constraint - that the
context survives until after all code no longer needs it, but is freed
eventually. I implemented his suggestion, and verified that this alternative
solution also made Reini's test case pass. So this is the solution that we
went with, as it permits Storable to be used reliably during global
destruction, instead of reliably forbidding it.

As I've been removing code for obsolete platforms, I've been searching the
entire codebase for references to them. It's impressive how many nooks and
crannies need to be checked to clean up properly, and not surprising that
sometimes the odd reference get missed. (Just like fakethr.h, hiding in
plain sight for a decade.) Part of the fun is because platform specific
README files are installed as man pages. For example, the file README.vos
becomes perlvos.pod, which is installed as a man page, which you can read
with `man perlvos`. The fun comes because various parts of the codebase want
a list of all the man pages, and these platform specific files (along with
various other things generated or copied) should be in that list. So you
can't get the complete list by globbing <pod/*.pod>, at which point do you
create a file listing everything, or just the exceptions to add to that
glob? Or something else? It's a trade off.

One place that some time ago I had noticed had a list of perl manpages was
the perl debugger. The intent is that you can save some typing by being able
to abbreviate "doc perlfunc" as "doc func" etc, omitting the "perl" prefix
from the man page name. It needs to know the correct name because on *nix
platforms it runs the system's man command to display the page. If `man
func` failed to run, and "func" was in the list, then it would try a second
time with `man perlfunc`. To implement this abbreviation feature, the
debugger had a hard-coded list of man page names. *Inevitably* this list was
out of date, missing more recently added man pages as well as having the odd
reference to a page now removed.

(I think it's fair to say of anything that if it's neither automatically
generated nor tested for up-to-dateness, then it runs the risk of becoming
stale, and for a large enough project and an old enough project, the
probabilities aren't in favour of fresh.)

Seeking a proper solution to the problem (which doesn't create an ongoing
requirement for human intervention), I had considered using some of the
existing code in regen/ to keep the list up to date. But this feels ugly, so
I held off. But recently I figured that one could completely eliminate the
list by looking at run time in the perl library tree for the installed Pod
files. If a pod file is found, it's a legitimate abbreviation. If the Pod
file isn't found, then it's assumed not to be a legitimate abbreviation, and
the second man command isn't attempted. If for whatever reason your
distribution chose not to install the Pod files, all that "breaks" is that
you have to type "doc perlfunc" each time. It seemed like the right trade

So I started work. Only to find that in the shipping the man
command had been inadvertently broken by a recent refactoring, and no-one
had spotted this. Obviously, I fixed this first, and wrote the simplest test
I could to try to ensure that the doc command didn't break again. Try to run
"man perlrules", and expect that to fail. And "it fails" isn't quite good
enough, as the test needs to be sure that it gets the "not found" error
message from running an external command, given that the regression it's
trying to protect against is failure to run that. So I wrote a minimal test
that causes the debugger to run `man perlrules` and check that it's man that
is failing.

Fail it does, but not always with the same message. I'd assumed that the
failure message was the same on all *nix systems. Tony Cook fixed the test
to skip on all platforms except Linux. Whilst only testing on one platform
isn't ideal, it's a common platform, so the tests are going to get run
enough, and it's better than no tests, which seemed to be the only
alternative. So Linux only should be safe, right?

If the question is in the title, you know that the answer is "no". You also
need to override the locale, in case man has been localised to report
translated error messages. Tony tested a fix on the failing platforms,
localising $ENV{LANG} and $ENV{LC_MESSAGES} to "C", and pushed this to
blead. Which worked on *those* platforms. But whack-a-mole wasn't quite
over, because LC_ALL overrides LC_MESSAGES, as demonstrated by subsequent
platform failures. So it took one more tweak to the test before we had
something reliable enough to have no failures. Everything takes longer than
you expect it to, because testing portably and robustly is hard, and you
usually don't get it exactly right first or even second time.

Nicholas Clark Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About