develooper Front page | perl.perl5.porters | Postings from October 2013

NWCLARK TPF grant July report

Thread Next
Nicholas Clark
October 3, 2013 14:50
NWCLARK TPF grant July report
Message ID:
As per my grant conditions, here is a report for the July period.

The main theme for this month seems to be "clean up parts of the build".

perl was first developed on a Unix system, back in the times when there
dozens of different Unix variants. Hence portability was initially across
the different C libraries (and even C compilers - remember, Perl predates
the C89 standard, let alone its implementation). Figuring out precisely what
the system could(n't) do (correctly) was performed by a shell script,
Configure, and it expanded its variables into files needed for the build by
using simple shell scripts as templates.

It's still how configuration is done by perl 5, and "figure things out with
a shell script" is the standard way that pretty much all open source
software builds on *nix systems. There's an awful lot of useful hard-won
knowledge distilled into these shell-based configuration systems, and whilst
they aren't that pretty, they work, which to the end user is what matters.

The template expansion is necessary because to maintain sanity, it's
necessary to maintain a clean distinction between files that are generated
as part of the configuration and build, and files that shipped with the
package (from a tarball, or from a version control system). There's
implicitly a concept of who "owns" the file - the human using a text editor,
or the build system.

Makefiles are one of the types of files whose contents need to vary based on
things determined by Configure. Hence, it naturally falls out from the above
description that the various subdirectories have Makefiles which are
generated by templates implemented as shell scripts. This plan doesn't fare
so well now that "portability" no longer just means to different *nix
systems, but also to systems that don't even have shells.

A particular mess was the Makefile for the utils/ directory. Like the other
Makefiles it had been being generated by running a Makefile.SH.
Unsurprisingly this doesn't work on Win32, so the solution was to *also*
check the generated Makefile into the repository. Just to add to the fun,
here VMS is special, and has rules in its top-level Makefile (DESCRIP.MMS)
to generate files in utils/, and completely ignores utils/Makefile.

This particular Makefile has very few configuration controlled changes, so
typically the regenerated version was identical to the checked in version.
Unfortunately for some configurations it would differ, with the result that
configure && make of a clean tree would result in a dirty tree, with
(seeming) unchecked in changes. This is wrong - building should never make a
clean tree dirty.

The irony is that the Makefile generation doesn't need to be a shell script.
By the time that utils/Makefile is needed, there is already a fully
serviceable miniperl ready and waiting.

So I bit the bullet and replaced utils/Makefile.SH with utils/Makefile.PL
Carefully. Firstly by replacing the shell script with a perl script that
generates a byte-for-byte identical utils/Makefile (including boilerplate
that claims it was generated by utils/Makefile.SH), which was called from
the same place that the *nix Makefile had been calling Makefile.SH

With that working on *nix, I then added rules to the Win32 Makefiles to call
utils/Makefile.PL to generate utils/Makefile, and rules to both them and the
*nix Makefile to delete the generated file as part of the cleanup targets.
With that done, I could check in a commit that deleted utils/Makefile from
git, and we no longer had a file in the repository being overwritten as part
of the build.

Tied up with maintaining sanity by keeping a clear distinction between "build
products" and source code is a requirement to clean up everything. We can't
just use git to do this for a variety of reasons

1) Perl 5 runs on platforms on which git doesn't
2) Perl 5 still needs to build from a tarball (or equivalent) even on
   platforms where git runs (for example, how Linux distributions manage
   versioning their upstream - ie us)
3) git clean works by removing everything which is not known to git, rather
   than only things known to be built. Zapping everything this is
   "unhelpful" when you have work in progress which isn't ready for checking
   in, but still need to clean.
4) git uses Perl 5.

(Circular dependencies are bad. It's why File::Temp wasn't able to switch
from ExtUtils::MakeMaker to Module::Build, because Module::Build depends on

So the approach the core takes is to explicitly delete files and directories
that were created, as this avoids inadvertently deleting the wrong thing,
such as work in progress. But this does mean that the core needs to know the
identity of all files that it builds. That can be easier "said" than done.
For example...

As part of building XS extensions and dual life modules, their files are
copied into the lib/ directory at the top level of the build tree. As lib/
doesn't start off empty, but contains various core-only modules, this makes
cleanup complex. Fortunately ExtUtils::MakeMaker can be relied upon to
delete any files that it copied there, but unfortunately it doesn't track
which directories it needed to create. So this ends up with a section of
manually maintained rmdir commands in Makefile.SH, and of course more work
with rmdir /s /q commands in the Win32 Makefiles.

It had occurred to me some time ago that effectively this information about
"what should we delete" is duplicated in .gitignore files. To ensure that
git status on a build of a clean tree looks something like this:

    $ git status
    # On branch blead
    nothing to commit (working directory clean)

even though the build created lib/ as a copy of ext/Fcntl/
(etc). We could have solved that by brute force telling git to ignore all
files named *.pm, but that would be a bad plan, because then it wouldn't
warn us if we've created a new file but forgotten about it, eg:

    $ touch lib/
    $ git status
    # On branch blead
    # Untracked files:
    #   (use "git add <file>..." to include in what will be committed)
    #       lib/
    nothing added to commit but untracked files present (use "git add" to track)

So we need to keep a list of files copied by the build.

Or do we? I had an insight that I'd previously missed. We can actually infer
the list of files that will get copied to lib/ by the build process. After
all, ExtUtils::MakeMaker figures out what to copy where, so it's possible to
use the same approach to generate a lib/.gitignore that lists the copied
files. It's actually relatively simple, given that we know the source
directories for potential files (cpan/, dist/, and ext/) and the target
(lib/). We use the list of files that ship in lib/ to infer the directories
that will exist in lib/ for a fresh checkout. Then, for every file in the
source directories, we figure out where it will be copied to. This is a bit
of a game - sometimes we can get it from just the filename, sometimes we
need to scan the file to find a C<package> directive, and for a few it's
just simplest to cheat (ie hard code them). If the file is to be copied to a
directory that already exists, then that filename is added lib/.gitignore.
If the directory doesn't exist, then write an entry to lib/.gitignore to
ignore the entire directory. And job's a good 'un. Well, almost - there are
a few files which this doesn't find, but they don't change frequently, so
it's much simpler to list them manually in the top level .gitignore than to
write complex code to cope with them.

So that's lib/.gitignore now automated.

And it turns out that doing that is already nearly all the work needed to
figure out which *directories* are going to be created by
ExtUtils::MakeMaker. Combine that with the existing infrastructure that
manages updating the Makefiles (first written to automate handling of the
many pod files) and we now have Makefiles that will clean up everything,
without long dead rules for files and modules now gone, and without missing
files which were recently added. And the existing infrastructure also gives
us for free a regression test to ensure that the edited Makefiles and
generated lib/.gitignore aren't stale.

And with that, a lot of scut-work is gone.

Another build related mess that I tidied up this month was
ExtUtils::Miniperl. If you're building an XS extension and want to link it
statically with perl, ExtUtils::MakeMaker achieves this by writing out a new
perlmain.c for you, compiling that, and linking it with the your extension
and libperl.a (from the installed tree). It generates that new perlmain.c
using the module ExtUtils::Miniperl. It's also the case that perlmain.c as
used by the core tree to build perl itself is generated by
ExtUtils::Miniperl (well, since commit fbcaf61123069fe4 in Nov 2010, when I
changed the *nix build to use the module, and eliminated the shell script
that used to do it - spot a pattern here?).

However, how ExtUtils::Miniperl *used* to get built was itself an exercise
in layered yaks. ExtUtils::Miniperl is a Perl module to generate C code. It
used to be generated at build time by a script, which would read
the file miniperlmain.c (checked into git - the file containing the main()
used by miniperl), scan the C code for known markers, and then write out the
module. So we had Perl code scanning C code to generate Perl code that will
generate C code.

This is obviously more complex than it needs to be. But what's the best way
to untangle it? Ideally we'd also write out miniperlmain.c using
ExtUtils::Miniperl, to avoid needing to have code that scans it (and fragile
marker comments). The problem here is that miniperlmain.c is needed to build
miniperl, which is needed to run ExtUtils::Miniperl to generate
minperlmain.c, so there's a circular dependency. So that's not looking good.

Turns out that there is a simple solution, by exploiting a different feature
of the distribution. There are some files in the build needed to build
miniperl which are generated by Perl code. These are not updated
automatically (they can't be - otherwise you end up with circular build
dependencies, which make abhors, and can often confuse a parallel make
sufficient to make it fork bomb), so they are really only suitable for files
that change rarely. It also used to be the case that the checked in version
could get out of sync with the code and data used to generate it. (Humans
are involved, so forgetfulness happens - we've fixed this now with a
regression test that spots it.). So by codifying miniperlmain.c as a file
regenerated by a small script using ExtUtils::Miniperl, it was possible to
eliminate the scanning code and an entire layer of code that generates code.

The fun isn't quite over yet. It turns out that there's yet another twist in
the tail. As well as all the interaction between miniperlmain.c and
ExtUtils::Miniperl, there's yet another copy of perlmain.c in

ExtUtils::Embed is a module added to the core back by commit
a2c6f8f14c6bd897 in July 1996. It's described as "Utilities for embedding
Perl in C/C++ applications", including 4 routines to generate the C code to
initialise statically linked extensions. Given that the perl executable is
structured as a trivial main() wrapper around an embedded perl interpreter,
this means that ExtUtils::Embed is doing pretty much the same job as
ExtUtils::Miniperl, and indeed the code was effectively forked off
ExtUtils::Miniperl 17 years ago.

The code hasn't diverged much (and fortunately this also means that there
weren't bugs fixed in one but not the other) so it was relatively simple to
refactor ExtUtils::Miniperl to call routines in ExtUtils::Embed. The result
is that there is now only one copy of Perl code to generate the C for an
xsinit() function, for all the different use cases.

It always seems to be the small things that can explode the most. As part of
some previous refactoring work, I spotted an unnecessary use of the Makefile
macro $(LDLIBPTH) in the *link* line for miniperl. But before removing it, I
went for a quick check of revision history to find out who added it. Which
was where the fun began.

The macro had been added, removed, and then re-added 10 years ago after a
sequence of reports from someone with a regular smokes running on a PowerPC
Linux system. Power PC Linux systems not being the most common thing back then
(or any time since) it was assumed that the problem might be something PowerPC
Linux specific, particularly as no-one else was able to replicate the problem,
or figure it out.

Fortunately, thanks to the GCC Compile Farm I currently have access to a
Power7 system ( )
However, in some ways this was most useful in proving that it wasn't
architecture specific (or really, even Linux specific)

The "fun" is because of an awkward combination of things

1) The smoker's "compiler" is actually implemented as a Perl wrapper that
   calls the real compiler.
2) The system's /usr/bin/perl is built to use a shared, which
   means that it needs to find the system installed to work
3) The smoker's configuration was building with a shared, which
   means that to test it (uninstalled, obviously) it needs to be forced to
   find the newly built

Most people aren't using a Perl wrapper for a compiler so they don't tick all
the boxes. The default isn't to build a shared perl library, so that avoids
ticking one box. (Shared libraries mandate telling the C compiler to generate
position independent code, which isn't as efficient at runtime.) Which is why
no-one hit the problem.

The difficulty is that the tools available to force a binary to load shared
objects from a non-default path aren't that flexible, even on Linux. And you
can't not do this, as without this you can't even test it before installing.
The "obvious" answer is to set the environment variable LD_LIBRARY_PATH to
the path of in the build tree. Different dynamic linking systems
have different names for the variable that does this, hence why they are
abstracted in the Makefile macro $(LDLIBPTH).

To avoid possible confusion, note that LD_LIBRARY_PATH only affects the
paths for runtime loading of shared libraries - not what the compiler
"bakes" into a binary about where it should search for shared libraries.
If you run gcc with LD_LIBRARY_PATH set in the environment, it affects the
startup of gcc, not what gcc outputs. To change the output, you need to pass
-rpath options to the linker, or use a different environment variable
LD_LOAD_PATH. So the Makefile setting LD_LIBRARY_PATH in the command line
used to link miniperl is a complete red herring.

The frustration is that this approach of setting LD_LIBRARY_PATH to force
loading of the newly created library goes subtly wrong - LD_LIBRARY_PATH
adds directories to search at the *end* of the list, not the start. Hence
with this, any installed copy of the library is found first, even by the
uninstalled perl you've just built and are trying to test. Which isn't good,
because it means that you're *not testing* the thing that you just built.
Instead you have the ./perl shim loading up and running the installed, which at best crashes early, and at worst gives sufficiently
believable test results that you don't suspect anything.

So there's a brute force solution for this problem - use LD_PRELOAD to force
the dynamic linker to load ./ before doing anything else. This
does at least mean that the tested ./perl will always be loading
./, even there is another in the default path that it
would search. Obviously as there's only one environment, and only one set of
environment variables to control this, so if you set the variables before
running the test, every executable that runs during testing is affected. Not
so obviously what this actually means is that *every* dynamically linked
executable gets to load and link with ./, from /bin/true to
/usr/bin/make, but as all of them don't use any symbols defined by, it makes no difference.

The problem comes because the regression tests need to test that
ExtUtils::MakeMaker can build XS extensions, which means that those tests
need to run the C compiler toolchain. And if your C compiler toolchain is
actually a Perl program, even if it's an installed Perl program running
against the installed /usr/bin/perl, then it *too* will be forced to load
./ Which, quite likely, will crash in "interesting" and "exciting"
ways. (But almost certainly won't work, unless the two versions match

And there really isn't a solution to this. I can't see any way to have all of

0) Testing the actual binary that will get installed
1) Overriding the shared library that it loads to use the newly built
   library, not the existing installed library
2) Without overriding the shared library loaded by any other installed
   perl binary

Specific to the problem in question which caused all the churn back then, I
tried to replicate the problem on the Power Linux system using a wrapper
running on perl v5.8.0 (which would have been the version that the bug
reporter would be using), testing the then current version of blead.  From
that, and from reading the various e-mail threads at the time, I concluded
that that the ticket doesn't tell the whole story about the make environment
and that the toolchain was more non-standard than described in the ticket.

Hence I'm confident that the changes to the Makefile adding/removing
LD_LIBRARY_PATH were independent of the reported problems, so I was
eventually able to make the following small fix, and the generated Makefile
got a little simpler:

diff --git a/Makefile.SH b/Makefile.SH
index 15b32fd..9d44c2c 100755
--- a/Makefile.SH
+++ b/Makefile.SH
@@ -901,5 +901,5 @@ lib/ $& $(mini_obj)
 lib/ $& $(mini_obj)
        -@rm -f miniperl.xok
-       $(LDLIBPTH) $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \
+       $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \
            $(mini_obj) $(libs)
        $(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make minitest; exit 1'
@@ -913,5 +913,5 @@ lib/ $& $(mini_obj)
 $(PERL_EXE): $& perlmain$(OBJ_EXT) $(LIBPERL) $(static_ext) ext.libs $(PERLEXPORT)
        -@rm -f miniperl.xok
-       $(SHRPENV) $(LDLIBPTH) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs)
+       $(SHRPENV) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs)
 # Microperl.  This is just a convenience thing if one happens to

The full explanation is at the end of RT #23212, and most easily viewed as

The other entertainment of note this month was investigating why builds on
HP-UX had started failing one of the regression tests for regular
expressions. Bisecting showed that the failure started with commit
f1e1b256c5c1773d, which fixed the rules for parsing numeric escapes in
regexes. So why did the C code fail? This is a job for a debugger:

    $ gdb ./perl
    Detected 64-bit executable.
    Invoking /pro/local/bin/gdb64.
    /usr/lib/pa20_64/ Unable to find library ''.

Bother. That's not very useful. Try on the other machine:

    $ gdb ./perl
    Detected 64-bit executable.
    Invoking /pro/local/bin/gdb64.
    GNU gdb 6.6
    Copyright (C) 2006 Free Software Foundation, Inc.
    GDB is free software, covered by the GNU General Public License, and you are
    welcome to change it and/or distribute copies of it under certain conditions.
    Type "show copying" to see the conditions.
    There is absolutely no warranty for GDB.  Type "show warranty" for details.
    This GDB was configured as "hppa64-hp-hpux11.11"...
    (no debugging symbols found)
    (gdb) run
    Starting program: /perl/usr/nick/Perl/perl/perl
    Error trying to get information about dynamic linker.

A chocolate teapot would be about as much use.

So in theory, at this point I start having to figure out HP's own adb. Which
seems to be even more user-hostile than dbx on Solaris. This isn't looking
like it's going to be fun...

So it was a great relief to discover that I didn't need to fight various HP
debuggers (much). In that, the combination of SIGBUS and "the stack seems to
be corrupt" triggered something in my head - these are the typical symptoms
reported by a debugger when a C program has recursed too deeply and bust the
C stack. At which point I could replicate it (down to the same test) by
tweaking the stack size on x86_64 Linux. It's nothing directly to do with
architecture or OS. The default thread stack size on PA-RISC HP-UX is too
small. The regex compiler recurses (in C) for each () group, so 100 nested
() groups makes an abnormally large amount of recursion.

I could reproduce it by setting the thread stack size to 86016 on x86_64
(that number will vary with build options), and tune the failure point
earlier or later by changing the thread size. The regression test will also
fail during parsing even without the associated C code change. Huzzah.
Because that means that the solution is pure-Perl, simply changing the test
runner to set a larger stack size.

(Of course it wasn't quite that simple. The "big" stack size I chose turned
out to be smaller than the default used by gcc on Linux when building with
ASAN, so the first change was a regression elsewhere. And the default on OS
X is also too small, so it also needs an unconditional stack size increase.)

It's good that perl itself is "stackless", else we'd hit these stack issues
with threading whenever Perl code recurses too much.

One last little thing I did was to add sort to the code which generates
Errno and Pod::Functions so that the same output is always generated,
independent of randomised hash ordering. The alternate orderings of
generated code wasn't a bug, so didn't absolutely *need* fixing, but being
able to compare the generated files and have them byte-for-byte identical
makes debugging and refactoring a lot easier.

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in
RT #... is a bug in
CPAN #... is a bug in
ID YYYYMMDD.### is an bug number in the old bug system. The RT # is given
  afterwards. You can look up the old IDs at

[Hours]		[Activity]
  0.25		5.18.1
  1.00		Compiled-in POSIX character class inversion lists
  1.50		Exporter -> dist/Exporter
  3.00		ExtUtils::Embed::canon
		ExtUtils::Embed::canon and -Uusedl
  9.50		ExtUtils::Miniperl
  4.00		File::Find -> ext/
  0.50		File::Spec XS
  1.25		FindExt, VMS::FileSpec
  7.00		HP-UX
		HP-UX t/re/pat_thr.t
  0.75		ID 20000804.001 (RT #3634)
  1.25		Makefile.SH valgrind targets
  0.50		PL_op_exec_cnt cleanup
  0.25		RT #113932
  0.25		RT #118055
  0.25		RT #118139
  1.25		RT #118543
  0.25		RT #118933
  0.25		RT #119003
  0.75		RT #119069/RT #119071/RT #119073/RT #119075
  6.75		RT #119089
 15.25		RT #23212
  0.25		RT #38307
  0.25		RT #56902
  1.00		RT #70357
  1.25		S_pad_findlex slowdown
  7.00		Storable blessed big endian failure
  1.25		VMS lib/warnings.t
  3.75		VMS-Filespec and known_extensions
  0.50		clang 3.3
  0.75		clang/toke.c/heredoc
  1.25		deterministic Errno and Pod::Functions
  3.75		gcc ASAN (merging -DPERL_GLOBAL_STRUCT fixes)
  1.25		installperl/installman
  9.25		lib/.gitignore
  0.25		lib/warnings.t failure
  1.00		postfix dereference
  0.25		process, scalability, mentoring
 33.00		reading/responding to list mail
  6.25		regen/
  4.25		run_multiple_progs
  1.00		runperl
  0.50		smoke-me branches
  1.25		smoke-me/Makefile-norecurse
  0.75		smoke-me/blead
		smoke-me/blead (CRLF)
  0.25		smoke-me/branches
  1.75		smoke-me/nicholas/installlib
  1.00		smoke-me/nicholas/regen
  3.50		smoke-me/no-LDLIBPTH-for-CC
  1.25		smoke-me/padconst
  0.50		smoke-me/quieten-readonly-ops
  0.50		smoke-me/sys_intern
  0.25		smoke-me/trim-superfluous-Makefile
  0.25		sub {}
  5.25		utils/Makefile.SH
  0.50		version -> cpan/
151.50 hours total

Nicholas Clark

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About