develooper Front page | perl.perl5.porters | Postings from October 2013

NWCLARK TPF grant report #98

From:
Nicholas Clark
Date:
October 3, 2013 13:29
Subject:
NWCLARK TPF grant report #98
Message ID:
20131003132849.GF4940@plum.flirble.org
[Hours]		[Activity]
2013/07/14	Sunday
 0.75		installperl/installman
 1.25		run_multiple_progs
=====
 2.00

2013/07/15	Monday
 0.50		ExtUtils::Embed::canon
 2.00		reading/responding to list mail
 2.50		run_multiple_progs
 0.50		smoke-me/no-LDLIBPTH-for-CC
=====
 5.50

2013/07/16	Tuesday
 0.50		ExtUtils::Embed::canon
 4.75		RT #23212
 0.50		RT #70357
 1.75		reading/responding to list mail
 0.50		version -> cpan/
=====
 8.00

2013/07/17	Wednesday
 2.00		ExtUtils::Embed::canon and -Uusedl
 4.00		RT #23212
 0.25		RT #70357
 0.75		bisect.pl
 0.50		clang 3.3
 0.25		reading/responding to list mail
=====
 7.75

2013/07/18	Thursday
 4.00		RT #23212
 2.25		lib/.gitignore
 1.00		smoke-me/nicholas/regen
=====
 7.25

2013/07/19	Friday
 2.50		RT #23212
 1.00		lib/.gitignore
=====
 3.50

2013/07/20	Saturday
 6.00		lib/.gitignore
=====
 6.00

Which I calculate is 40.00 hours

It always seems to be the small things that can explode the most. As part of
some previous refactoring work, I spotted an unnecessary use of the Makefile
macro $(LDLIBPTH) in the *link* line for miniperl. But before removing it, I
went for a quick check of revision history to find out who added it. Which
was where the fun began.

The macro had been added, removed, and then re-added 10 years ago after a
sequence of reports from someone with a regular smokes running on a PowerPC
Linux system. PowerPC Linux systems not being the most common thing back
then (or any time since) it was assumed that the problem might be something
PowerPC Linux specific, particularly as no-one else was able to replicate
the problem, or figure it out.

Fortunately, thanks to the GCC Compile Farm I currently have access to a
Power7 system ( http://gcc.gnu.org/wiki/CompileFarm#Machine_Detailed_List )
However, in some ways this was most useful in proving that it wasn't
architecture specific (or really, even Linux specific)

The "fun" is because of an awkward combination of things

1) The smoker's "compiler" is actually implemented as a Perl wrapper that
   calls the real compiler.
2) The system's /usr/bin/perl is built to use a shared libperl.so, which
   means that it needs to find the system installed libperl.so to work
   (properly)
3) The smoker's configuration was building with a shared libperl.so, which
   means that to test it (uninstalled, obviously) it needs to be forced to
   find the newly built libperl.so

Most people aren't using a Perl wrapper for a compiler so they don't tick
all the boxes. The default isn't to build a shared perl library, so that
avoids ticking one box. (Shared libraries mandate telling the C compiler to
generate position independent code, which isn't as efficient at runtime.)
Which is why no-one hit the problem.

The difficulty is that the tools available to force a binary to load shared
objects from a non-default path aren't that flexible, even on Linux. And you
can't not do this, as without this you can't even test it before installing.
The "obvious" answer is to set the environment variable LD_LIBRARY_PATH to
the path of libperl.so in the build tree. Different dynamic linking systems
have different names for the variable that does this, hence why they are
abstracted in the Makefile macro $(LDLIBPTH).

To avoid possible confusion, note that LD_LIBRARY_PATH only affects the
paths for runtime loading of shared libraries - not what the compiler
"bakes" into a binary about where it should search for shared libraries. If
you run gcc with LD_LIBRARY_PATH set in the environment, it affects the
startup of gcc, not what gcc outputs. To change the output, you need to pass
-rpath options to the linker, or use a different environment variable
LD_LOAD_PATH. So the Makefile setting LD_LIBRARY_PATH in the command line
used to link miniperl is a complete red herring.

The frustration is that this approach of setting LD_LIBRARY_PATH to force
loading of the newly created library goes subtly wrong - LD_LIBRARY_PATH
adds directories to search at the *end* of the list, not the start. Hence
with this, any installed copy of the library is found first, even by the
uninstalled perl you've just built and are trying to test. Which isn't good,
because it means that you're *not testing* the thing that you just built.
Instead you have the ./perl shim loading up and running the installed
libperl.so, which at best crashes early, and at worst gives sufficiently
believable test results that you don't suspect anything.

So there's a brute force solution for this problem - use LD_PRELOAD to force
the dynamic linker to load ./libperl.so before doing anything else. This
does at least mean that the tested ./perl will always be loading
./libperl.so, even there is another libperl.so in the default path that it
would search. Obviously as there's only one environment, and only one set of
environment variables to control this, so if you set the variables before
running the test, every executable that runs during testing is affected. Not
so obviously what this actually means is that *every* dynamically linked
executable gets to load and link with ./libperl.so, from /bin/true to
/usr/bin/make, but as all of them don't use any symbols defined by
libperl.so, it makes no difference.

The problem comes because the regression tests need to test that
ExtUtils::MakeMaker can build XS extensions, which means that those tests
need to run the C compiler toolchain. And if your C compiler toolchain is
actually a Perl program, even if it's an installed Perl program running
against the installed /usr/bin/perl, then it *too* will be forced to load
./libperl.so Which, quite likely, will crash in "interesting" and "exciting"
ways. (But almost certainly won't work, unless the two versions match
exactly.)

And there really isn't a solution to this. I can't see any way to have all
of

0) Testing the actual binary that will get installed
1) Overriding the shared library that it loads to use the newly built
   library, not the existing installed library
2) Without overriding the shared library loaded by any other installed
   perl binary


Specific to the problem in question which caused all the churn back then, I
tried to replicate the problem on the Power Linux system using a wrapper
running on perl 5.8.0 (which would have been the version that the bug
reporter would be using), testing the then current version of blead. From
that, and from reading the various e-mail threads at the time, I concluded
that that the ticket doesn't tell the whole story about the make environment
and that the toolchain was more non-standard than described in the ticket.

Hence I'm confident that the changes to the Makefile adding/removing
LD_LIBRARY_PATH were independent of the reported problems, so I was
eventually able to make the following small fix, and the generated Makefile
got a little simpler:

diff --git a/Makefile.SH b/Makefile.SH
index 15b32fd..9d44c2c 100755
--- a/Makefile.SH
+++ b/Makefile.SH
@@ -901,5 +901,5 @@ lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl
 lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl
        -@rm -f miniperl.xok
-       $(LDLIBPTH) $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \
+       $(CC) $(CLDFLAGS) -o $(MINIPERL_EXE) \
            $(mini_obj) $(libs)
        $(LDLIBPTH) $(RUN) ./miniperl$(HOST_EXE_EXT) -w -Ilib -Idist/Exporter/lib -MExporter -e '<?>' || sh -c 'echo >&2 Failed to build miniperl.  Please run make minitest; exit 1'
@@ -913,5 +913,5 @@ lib/buildcustomize.pl: $& $(mini_obj) write_buildcustomize.pl
 $(PERL_EXE): $& perlmain$(OBJ_EXT) $(LIBPERL) $(static_ext) ext.libs $(PERLEXPORT) write_buildcustomize.pl
        -@rm -f miniperl.xok
-       $(SHRPENV) $(LDLIBPTH) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs)
+       $(SHRPENV) $(CC) -o perl $(CLDFLAGS) $(CCDLFLAGS) perlmain$(OBJ_EXT) $(static_ext) $(LLIBPERL) `cat ext.libs` $(libs)
 
 # Microperl.  This is just a convenience thing if one happens to

The full explanation is at the end of RT #23212, and most easily viewed as
https://rt.perl.org/rt3/Ticket/Attachment/1236123/643477/

Nicholas Clark



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About