develooper Front page | perl.perl5.porters | Postings from October 2013

NWCLARK TPF grant report #99

From:
Nicholas Clark
Date:
October 3, 2013 13:31
Subject:
NWCLARK TPF grant report #99
Message ID:
20131003133054.GG4940@plum.flirble.org
[Hours]		[Activity]
2013/07/22	Monday
 0.25		RT #118055
 0.25		RT #118933
 0.75		clang/toke.c/heredoc
 3.50		reading/responding to list mail
 3.00		regen/lib_cleanup.pl
 0.25		run_multiple_progs
 0.25		smoke-me/branches
=====
 8.25

2013/07/23	Tuesday
 3.50		reading/responding to list mail
 2.25		regen/lib_cleanup.pl
 1.75		smoke-me/nicholas/installlib
=====
 7.50

2013/07/24	Wednesday
 0.25		RT #119003
 7.25		reading/responding to list mail
 1.00		regen/lib_cleanup.pl
=====
 8.50

2013/07/25	Thursday
 0.25		5.18.1
 0.75		HP-UX
 6.50		reading/responding to list mail
 1.25		smoke-me/padconst
=====
 8.75

2013/07/26	Friday
 0.25		HP-UX
 5.75		reading/responding to list mail
=====
 6.00

2013/07/27	Saturday
 0.25		RT #56902
 1.25		S_pad_findlex slowdown
=====
 1.50

Which I calculate is 40.50 hours

Tied up with maintaining sanity by keeping a clear distinction between
"build products" and source code is a requirement to clean up everything. We
can't just use git to do this for a variety of reasons

1) Perl 5 runs on platforms on which git doesn't
2) Perl 5 still needs to build from a tarball (or equivalent) even on
   platforms where git runs (for example, how Linux distributions manage
   versioning their upstream - ie us)
3) git clean works by removing everything which is not known to git, rather
   than only things known to be built. Zapping everything this is
   "unhelpful" when you have work in progress which isn't ready for checking
   in, but still need to clean.
4) git uses Perl 5.

(Circular dependencies are bad. It's why File::Temp wasn't able to switch
from ExtUtils::MakeMaker to Module::Build, because Module::Build depends on
File::Temp)

So the approach the core takes is to explicitly delete files and directories
that were created, as this avoids inadvertently deleting the wrong thing,
such as work in progress. But this does mean that the core needs to know the
identity of all files that it builds. That can be easier "said" than done.
For example...

As part of building XS extensions and dual life modules, their files are
copied into the lib/ directory at the top level of the build tree. As lib/
doesn't start off empty, but contains various core-only modules, this makes
cleanup complex. Fortunately ExtUtils::MakeMaker can be relied upon to
delete any files that it copied there, but unfortunately it doesn't track
which directories it needed to create. So this ends up with a section of
manually maintained rmdir commands in Makefile.SH, and of course more work
with rmdir /s /q commands in the Win32 Makefiles.

It had occurred to me some time ago that effectively this information about
"what should we delete" is duplicated in .gitignore files. To ensure that
git status on a build of a clean tree looks something like this:

    $ git status
    # On branch blead
    nothing to commit (working directory clean)

even though the build created lib/Fcntl.pm as a copy of ext/Fcntl/Fcntl.pm
(etc). We could have solved that by brute force telling git to ignore all
files named *.pm, but that would be a bad plan, because then it wouldn't
warn us if we've created a new file but forgotten about it, eg:

    $ touch lib/Imposter.pm
    $ git status
    # On branch blead
    # Untracked files:
    #   (use "git add <file>..." to include in what will be committed)
    #
    #       lib/Imposter.pm
    nothing added to commit but untracked files present (use "git add" to track)

So we need to keep a list of files copied by the build.

Or do we? I had an insight that I'd previously missed. We can actually infer
the list of files that will get copied to lib/ by the build process. After
all, ExtUtils::MakeMaker figures out what to copy where, so it's possible to
use the same approach to generate a lib/.gitignore that lists the copied
files. It's actually relatively simple, given that we know the source
directories for potential files (cpan/, dist/, and ext/) and the target
(lib/).  We use the list of files that ship in lib/ to infer the directories
that will exist in lib/ for a fresh checkout. Then, for every file in the
source directories, we figure out where it will be copied to. This is a bit
of a game - sometimes we can get it from just the filename, sometimes we
need to scan the file to find a C<package> directive, and for a few it's
just simplest to cheat (ie hard code them). If the file is to be copied to a
directory that already exists, then that filename is added lib/.gitignore.
If the directory doesn't exist, then write an entry to lib/.gitignore to
ignore the entire directory. And job's a good 'un. Well, almost - there are
a few files which this doesn't find, but they don't change frequently, so
it's much simpler to list them manually in the top level .gitignore than to
write complex code to cope with them.

So that's lib/.gitignore now automated.

And it turns out that doing that is already nearly all the work needed to
figure out which *directories* are going to be created by
ExtUtils::MakeMaker. Combine that with the existing infrastructure that
manages updating the Makefiles (first written to automate handling of the
many pod files) and we now have Makefiles that will clean up everything,
without long dead rules for files and modules now gone, and without missing
files which were recently added. And the existing infrastructure also gives
us for free a regression test to ensure that the edited Makefiles and
generated lib/.gitignore aren't stale.

And with that, a lot of scut-work is gone.

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About