develooper Front page | perl.perl5.porters | Postings from July 2013

NWCLARK TPF grant June report

Thread Next
From:
Nicholas Clark
Date:
July 26, 2013 15:05
Subject:
NWCLARK TPF grant June report
Message ID:
20130726150453.GP4940@plum.flirble.org
As per my grant conditions, here is a report for the May period.

"The nice thing about standards is that you have so many to choose from"
(Andrew S. Tanenbaum). I guess the same can be said about build systems.

So the structural intent of the build is

1) Permit the user to choose configuration options
2) Build the package
   (which may take some time, and shouldn't need user intervention)
3) Test the package, and collate all test results into one report at the end
   (an excuse for a second tea break)
4) Install the package
   (which probably runs with elevated privileges)

As well as trying to avoid a long period where a human needs to babysit the
build in case it stops to ask a question, this approach also has the benefit
that you find out by the end of configuration what extensions the build
stage should be producing. Or, more importantly (compared with at least one
other similar language), you don't need to wait until the end of the build
run to discover that an extension you really needed isn't built, and then
have to iterate the entire configure & build steps until you figure out the
correct form of rubber chicken sacrifice to make it all work.

Of course, the problem is that for step 1 one can't assume you have a copy
of Perl already (because how did it get built?) so the configuration system
has to run using native tools. And the more platforms the package is ported
to, the more variations of native tools you have.

So, on *nix and VMS, where the OS, architecture and even the make utility
will vary, the configuration script figures out which extensions are shipped
by scanning the file system, because even the Makefile has to be
programmatically generated to cope with platform quirks. On Win32 variations
are a lot less, so it's viable to ship a pair of Makefiles which between
them cover all the common make variants. Hence on Win32 configuration is
implemented by changing options in the appropriate Makefile, and the build
determines which extensions are wanted by combining those options with a
scan done by the (uninstalled) FindExt module.

So that's a Perl module right? Which means that we can test it in a
platform-independent way. Which turned out to be useful back in 2009 when I
was working out how to move modules to cpan/ dist/ and ext/ as part of the
big rearranging to make dual life a lot simpler, as I could mostly verify
that my changes were going to work on Win32 without having any direct access
to a Win32 system to test it. The tests written for that purpose were robust
enough that they were moved to t/porting and run as standard, which verifies
that the logic in FindExt is consistent with that of Configure.

However we weren't able to test everything. We couldn't correctly test the
list of static extensions due to various problems, and list of dynamically
built extensions failed match due to 2 discrepancies between Configure logic
and FindExt.

Firstly, due to a typo in checking defines in %Config::Config, FindExt
thought that I18N::Langinfo would never be built (whereas it is built on
most *nix systems). So I fixed that, and everything now passed on *nix.
However, the test still failed on Win32, thanks to a problem that was a bit
more convoluted. In replicating Configure's logic, FindExt thought that
ODBM_File *would* be built on Win32, because win32 canned configs had
i_rpcsvcdbm set to define. What on Earth is i_rpcsvcdbm?

	This variable conditionally defines the I_RPCSVC_DBM symbol, which
	indicates to the C program that <rpcsvc/dbm.h> exists and should
	be included.  Some System V systems might need this instead of <dbm.h>.

Eh? Win32 is most definitely not an ancient System V Unix, and won't repeat
the same old quirks (it has brave new quirks instead). It turned out that
FindExt was quite correct, and the canned configs (and header files) had
been wrong since 1997. The problem hadn't been spotted because the Win32
configuration explicitly says not to build ODBM_File. Now it's correct.
Combine all this with fixes by (at least) Steve Hay and Tony Cook, and it's
now possible to test that FindExt and Configure agree on which extensions
are to be built, and which are dynamically linked, which are statically
linked, and which are non-XS. While these changes of low utility themselves,
all this would prove useful to unravelling more of the build complexity.

I spotted a way to remove a few more tangles from the build, on *nix, VMS
and Win32. It's always fun having to juggle three different objects
together, and this was no exception.

The build has never depended on having Perl installed. Perl's portability
was able to scale to multiple architectures and OSes by

1) having the configuration system compile and *run* test programs to find
   out what works, and what needs to be worked around
2) bootstrapping as quickly as possible to a minimally working perl and then
   writing as much of the rest of the build infrastructure once, in Perl.

Attempting to adapt that to also permit cross-compiling is hard, which is
why it hasn't happened (yet). But all our build tools cross compile nicely.
(On *nix, that would be sh, sed, awk, grep, make, cc.) Hence one can
bootstrap Perl 5 onto a new platform, albeit in a rather round about way, by
first bootstrapping a native toolchain.
 
The various platform Makefiles contain the logic to try to get from some C
source to "working miniperl" as rapidly as possible. Part of the fun is that
a lot of the modules that are needed to "work" are actually dual life, hence
are shipped in dist/ or cpan/, and some modules, most importantly Config,
need to be generated from the platform specific build files. Additionally,
the build needs to be able to run in parallel*, which means that

1) it's beneficial to split build tasks as small as possible to maximise
   concurrency
2) it's necessary for every task to know its pre-requisites, so that make
   won't accidentally run a rule before something it depended on gets built

(or, how this actually manifests - the build fails some of the time due to a
race condition caused by a missing dependency, and it's very hard to
recreate and track down.)

Hence the build rules for things early in the build ended up being quite
tightly coupled to everything else early in the build, because as soon as
one changes where a file is located, or how it is built, all its explicit
and implicit dependencies have to be updated.

One particularly "big" dependency (because it is very early) is the file
lib/build_customize.pl. This is a key part of enabling the build to work at
all. If "$INC[0]/build_customize.pl" exists, then it's loaded by miniperl.
The trick is that lib/build_customize.pl sets @INC to the absolute paths of
all the toolchain modules in ext/, dist/ and cpan/, so that the toolchain
can be shipped in an easy to maintain layout, but is capable of being loaded
to install each module into lib/ without first being in lib/ In turn,
lib/build_customize.pl is written by write_buildcustomize.pl using the
pure-Perl code in Cwd, building on the existing cross-platform nature of the
Perl code to avoid having to produce 3 (or more) platform specific ways of
converting directories to absolute paths.

Once lib/build_customize.pl is in place, just running `./miniperl -Ilib` is
enough to make the otherwise unbuilt distribution behave enough like a
"normal" *installed* perl that the rest of the build system doesn't need to
set up anything special. The upshot of all this is that there's one small
piece of code which works everywhere (win for the Perl build scripts), but
every rule in the Makefile (and the Win32 Makefiles, and DESCRIP.MMK) needs
to ensure that it exists.

What I realised was that by removing one little bit of concurrency it would
be possible to simplify quite a lot of the other rules. Not just the direct
simplification of only having one dependency, but also a more subtle
simplification - once lib/build_customize.pl is in place, then Cwd is in
@INC (being one of the toolchain modules that write_buildcustomize.pl
locates) hence various other rules which previously had miniperl invoked
with multiple -I options to ensure that the pure-Perl Cwd could be loaded
from dist/ could now have all those extra -I options eliminated, as -Ilib
does it all once lib/build_customize.pl exists.

Specifically, by combining the rule that links miniperl with the rule to
generate lib/build_customize.pl, all this simplification would fall out.
And, somewhat perversely, it's actually conceptually simpler to have the rule
"officially" be for lib/build_customize.pl, with the miniperl rule depending
on it, than the other way round, as this means that the rest of the
Makefile(s) can depend on miniperl, which is much simpler to skim.

Of course, all this is only obvious in hindsight, and inevitably the devil
is in the detail when it comes to actually getting it to work, and work
reliably.

While removing the dependencies on [.lib]build_customize.pl from the the VMS
makefile I noticed that for VMS there was a second dependency that featured
heavily - [.lib.VMS]Filespec.pm - thanks to a requirement to copy it from
[.vms.ext] before it could be used. And, bonus, more code to copy its test
to [.t.lib]. All this was special case code, which could be completely
eliminated if both files could be moved into a regular extension in the
directory ext/VMS-Filespec, similar to ext/VMS-DCLsym and ext/VMS-Stdio, and
like them only built on VMS. The only thing added would be one line in
write_buildcustomize.pl to add ext/VMS-Filespec/lib to the toolchain @INC.

Of course, all this should be simple. But if it were simple, how come
VMS::Filespec isn't already in ext/? After all, VMS::DCLsym and VMS::Stdio
were both previously in vms/ext/, so how come all three weren't moved at the
same time? After all, *nix and Win32 already know to not try to build or
test VMS::DCLsym and VMS::Stdio, so why not add a third?

The answer (as ever) turns out to be another yak that needs shaving.
VMS::DCLsym and VMS::Stdio are XS modules. The build and test infrastructure
is quite capable of skipping XS modules. It has to be, because not all XS
modules can be built everywhere. But for various reasons, none of which were
really designed, it's not capable of not building a pure-perl module. I was
aware of this already, but now I had a real use case that it was preventing
me from implementing, it was irritating enough that I had reason to fix it.
Of course, it wasn't a small job, and consumed a good chunk of a second week
too...

So, what prevents us from having a pure-Perl extension in ext/ but not
building it? And how did it happen?

The situation we had reached was that there were 5 configuration variables:

dynamic_ext:        built dynamically linked XS modules
static_ext:         built statically linked XS modules
nonxs_ext:          built pure-Perl modules (from ext/, dist/ and cpan/)
extensions:         "$dynamic_ext $static_ext $nonxs_ext"
known_extensions:   *just* the XS modules shipped in ext/, dist/ and cpan/ 

with the upshot that "extensions" is typically much larger than
"known_extensions". Daft.

This situation has come about through "organic growth", rather than design.
I guess it's summarised as

0) Perl 5 predates CPAN
1) Originally ext/ only held XS code
2) Originally there was no concept of dual-life - if you wanted the
   extensions in ext/, you had to build them with perl
   (There wasn't even a toolchain - you could add other extensions into ext/
    and they would be build)
3) 15 years ago was patched to add nonxs_ext (commit 4318d5a0158916ac) ready
   to support Errno
   (Errno was added about two weeks later in commit eab60bb1f2e96e20)
   [curiously that commit adds Errno to known_extensions but not to
    extensions]
4) A few days later commit bfb7748a896459cc updates Configure so that
   nonxs_ext *are* in extensions, but are *not* in known_extensions.
   The description of the change is:   
   
    Explicitly split list of extensions into 3 kinds:  dynamic, static,
    and non-xs.  The Configure variable $extensions now holds all three.
    (The only current non-xs extension is Errno).

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/1998-07/msg00136.html

    It also updates Porting/Glossary, explicitly changing the description
    of known_extensions from "list of all extensions included" to
    "list of all XS extensions included", and extensions from
    "all extension files linked into the package" to
    "all extension files (both XS and non-xs linked into the package."

[Note that Errno *is* architecture specific, so gets installed into
the same directory subtree as all the shared objects]

Fast forward from 1998 to 2006

5) Commit 1d8961043b9b86e1 (or thereabouts) in April 2006 regenerates the
   sample config.sh to this:

       nonxs_ext='Compress/IO/Base Compress/IO/Zlib Compress/Zlib Errno'

   at which point, we have 3 more non-XS extensions, all of which are
   architecture independent. 

Subsequent re-arranging of dual-life modules in 2009 means that we've got a
lot more.

Effectively, the term "extensions" has been meaning "things we build via
Makefile.PL" for at least 7 years, if not 15, despite what all the
documentation tries to claim.

So after a lot of figuring out the why and how, and what it would likely
break (answer, nothing), I patched the *nix and Win32 build systems to fix
this. (I chickened out of figuring out enough DCL to deal with VMS. Craig
Berry was kind enough to deal with that.)

So why did this even matter? Because whilst the build system was quite happy
not building a pure-Perl module, all the tests for it would still be run
(and fail), due to implementation details of how t/TEST (and thus also
t/harness) decides what to skip. It refuses to skip anything unless it's in
"known_extensions" but missing from "extensions". As Andy Dougherty observed
after I submitted the patches to fix the build, nothing after Configure
should actually use known_extensions. Hence t/TEST is arguably buggy and
needs fixing. Maybe I could have used a smaller hammer if I had spotted the
correct problem to hit. :-)

However, it's done now, and the distribution is saner for it. And it permitted
the tests for FindExt to be made more comprehensive (and have fewer special
cases and skips).


Whilst looking at the *nix Makefile a lot trying to figure out how to
resolve the problems above, I noticed that there are quite a lot of
short-cut targets. These are targets added to simplify running various
commands, and I don't think that anyone uses. For example there were targets
related to profiling and testing tools for Tru64 and Irix (pixie and Third
Degree), for purify, quantify and purecov, targets to run the tests through
B::Deparse, to convert the tests to UTF-8 or UTF-16 before running them, and
to run the tests with -t to flag up taint warnings. (Plus, in some cases
targets to combine two of the above actions.)

It's still perfectly possible to *run* any of the above programs by "hand" -
no underlying functionality has been removed from the Makefile. It's just
got a little bit shorter and a little bit clearer.


We also discovered a problem with the previously described refactoring of
the initial build rules. While Father Chrysostomos was trying something out
(which seriously broke the ability of miniperl to even parse code), his make
went into an infinite loop calling itself recursively. Effectively, a fork
bomb. This isn't supposed to happen - a build failure is supposed to stop,
not take out one's machine.

The problem is that the *nix Makefile contains a lot of places where it
calls back to itself *in the same directory* to build a different target.
I'd been bitten by these some time ago. If things don't go as intended, you
can end up with an infinite loop as each recursive invocation of make
decides that the same thing needs doing first, and calling make again with
the same arguments. It gets even worse running make in parallel.

I think that historically things had been done this way as a means to have
various little utility commands or command sequences available, without
having to clutter the build directory with a shell script for each desired
"program", or repeating the same commands in multiple places in the
Makefile. Even if you get it right (ie avoid the above problems) then I feel
that it actually makes the build *less* clear, because you have to scan back
through the same Makefile, and then work out if the target requested is
stand alone, or going to have more side effects. Hence I'd considered these
as a pain point some time ago, and had tried to work to eliminate them.

They actually even directly work against correctness. The miniperl build
rules used to be this:

	$(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -MExporter -e '<?> || $(MAKE) minitest

The intent is to be "helpful" and automatically run minitest if miniperl
fails a basic sanity test. The problem is that minitest then looks like
this:

# Can't depend on lib/Config.pm because that might be where miniperl
# is crashing.
minitest: $(MINIPERL_EXE) minitest.prep
        - cd t && (rm -f $(PERL_EXE); $(LNS) ../$(MINIPERL_EXE) $(PERL_EXE)) \
                && $(RUN_PERL) TEST base/*.t comp/*.t cmd/*.t run/*.t io/*.t re/*.t opbasic/*.t op/*.t uni/*.t </dev/tty


with a dependency on minitest.prep, which looks like this:

minitest.prep:
        -@test -f lib/Config.pm || $(MAKE) lib/Config.pm $(unidatafiles)
        @echo " "
        @echo "You may see some irrelevant test failures if you have been unable"
        @echo "to build lib/Config.pm, or the Unicode data files."
        @echo " "


Hence to avoid a recursive loop when attempting to helpfully run minitest
automatically, it needs to recurse to a third level, and to skip doing so if
lib/Config.pm already exists. Note, *exists*, not "is up to date".

ie correctness has been sacrificed, although it's not immediately obvious.
The reason is that done this way, if you update the pre-requisites for
lib/Config.pm, make *won't* automatically re-build it. Meaning that you may
get bogus results if you edit them, and then re-run minitest to check your
changes.

The problem here seemed to be that my other changes made things more
fragile, and the fork bomb a lot more likely to trip. For now, I've removed
the automatic recursion (with make) to run minitest, as it removes the
fragility. Given that running minitest is one line, albeit rather long (as
shown above) I think that it should be possible to have that run directly by
the Makefile (without calling back to make to do it), but I can't quite see
how to do it. It feels like it ought to be possible to merge it with the
shell script that runs the regular tests, but I can't yet see a way that
merges the two without using *more* code than doing it separately. Something
is eluding me.


I also found a small but representative example of how the best of
intentions don't always produce the best solution to a problem, actually
increasing clutter.

a2p, the awk to Perl converter, is written in C. It dates from perl 1 time,
so two years before the first ANSI C standard, and like perl 1 it started
with the then classic 3-argument main() function:

    main(argc,argv,env)
    register int argc;
    register char **argv;
    register char **env;
    {

K&R style was converted to ANSI style with commit f0f333f455368029 back in
1997 and it had stayed fundamentally the same ever since, although the
register declarations have been removed, and const added. The perl
interpreter's main() function has evolved in the same way.

Hence in 2005, when Jarkko cranked up the strictness on the Tru64 compiler,
and fixed all issues that it warned about, he added the relevant pragma to
both perl and a2p to stop the compiler warning about the non (ANSI) standard
third parameter. Seems sane.

What no-one noticed was that unlike perl, a2p's main() doesn't actually
*use* the env parameter, so a better solution is to remove it. Which means
that the pragma can be removed too. So that's 4 lines gone, and 1 line
simplified.

Each of these sort of things on their own isn't really a problem, and really
aren't a priority to find, let alone fix. But there are potentially many
things which could be terser, tidier and clearer, and the sum of all the
little bits of suboptimal verbosity mounts up, making the core's code harder
for everyone to follow. Hence it seems sane to tackle them as and when they
are found, if there's an obvious simple safe fix.


The end of the month was quiet because we were visiting my parents. It
was only planned to be partly a holiday, but no plan survives contact
with the enemy (or good weather).

As the network at my parents is whatever we bring with us, I concentrated on
things that could be done locally. George's clang smoker** had been showing
failures for configurations with -Accflags=-DPERL_GLOBAL_STRUCT_PRIVATE when
built with with clang's address sanitizer. PERL_GLOBAL_STRUCT_PRIVATE is a
build option intended for extremist embedding - not just no global
variables, but even the variable used to hold the address of the structure
wraps the globals is itself hidden behind a function.

I had thought that the problems were fundamentally insoluble, due to
conflicting requirements between freeing that structure within global
destruction, versus code needing to look into it (to get the thread local
context) in the routine that called perl_destruct(). However, it turned out
that there is no fundamental conflict. The "use after free" error was
actually more obscure than that - it was actually code run by atexit() after
main() returns which was the problem, and the code wasn't defensive enough
to cope. As that code was just checking for a flag, the fix was as simple as
setting a variable to NULL after calling free(), and adding a NULL check in
the routine called by atexit().

Other problems that ASAN reported were mostly caused as a side effect of how
PERL_GLOBAL_STRUCT_PRIVATE is the only configuration that allocates storage
for the globals using malloc(). Every other configuration has the globals as
actual globals (either individual variables, or a structure which is
global), which results in them being zero-initialised. Hence the setup code
for PERL_GLOBAL_STRUCT_PRIVATE needs to zero a couple more globals.

The final problem it revealed *wasn't* specific to
PERL_GLOBAL_STRUCT_PRIVATE, but we hadn't noticed it before on any other
configuration with the existing test cases. There had been a long standing
bug that perl didn't cope correctly with a here-doc at the end of the script
without a final newline (RT #65838). The fix for this in some cases could
end up reading from free()d memory, if a particular buffer needed to be
resized. However, the code itself is only run if the Perl program ends with
a heredoc (which is an unusual structure), and if the last line of the file
on disk has no terminating newline character (which is also unusual as many
editors default to adding a final newline). Hence it's pretty rare to hit
it.


A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/

[Hours]		[Activity]
  1.00		APIs
  1.00		ASAN causing Makefile loop
  0.25		File::Spec XS
  0.25		FindExt
 14.25		Makefile target pruning
  0.25		Porting/Maintainers.pl
  3.75		RE_TRACK_PATTERN_OFFSETS/parse_start
  0.25		RT #109744
  0.25		RT #114576
  0.25		RT #118175
  0.25		RT #118195
 10.50		RT #118283
  0.50		RT #118365
  1.00		RT #118509
  3.50		RT #118549
  2.00		RT #118603
  0.25		RT #118653
  1.00		RT #38812
  0.25		RT #40403
  0.25		RT #47467
  1.00		RT #67114
  0.25		Regexp::Grammars
  6.50		Storable
		Storable (HP/UX)
  6.25		VMS
		VMS-Filespec and known_extensions
  1.50		Win32 & i_rpcsvcdbm
  2.25		Win32/FindExt
  0.50		a2p
  2.75		dots
  3.25		failures under -DPERL_GLOBAL_STRUCT_PRIVATE
 21.25		known_extensions, unbuilt non-XS extensions
  0.25		lib/perlmodlib.PL
 16.25		miniperl Makefile bootstrap ordering
  2.75		process, scalability, mentoring
 12.75		reading/responding to list mail
  0.50		smoke-me branches
  0.25		static build on Win32
  0.50		static extensions
  4.00		toke.c heredoc at EOF
  0.25		utils/
  0.50		what does "deprecated" mean?
======
124.50 hours total

Nicholas Clark

*  Being able to run the build in parallel is cheaper than increasing the
   number of hours in the day. Although I'm sure if you ask nicely on the
   Internet, someone will offer to take money from you to implement the
   latter solution. :-)
** http://m-l.org/~perl/smoke/perl/linux/blead_clang_sanitize=address/?C=M;O=D

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About