develooper Front page | perl.perl5.porters | Postings from September 2012

NWCLARK TPF grant August report

Thread Next
Nicholas Clark
September 8, 2012 01:53
NWCLARK TPF grant August report
Message ID:
A fair chunk of the month was taken up with investigating three related areas:

*) How code such as -e 'BEGIN {1}' is compiled, and the interaction between
   PL_main_cv, CvSTART(PL_main_cv), PL_main_root and PL_main_start
*) Code in op.c which calls CopLINE_set(PL_curcop, oldline) and warnings from
   multi-line constructions
*) Building with -DPERL_DEBUG_READONLY_OPS to force shared ops to be
   read-only, and determining the causes of code trying to write to read-only

Of these, I managed to finish the first two in August, but it took until the
first week of September, so that won't feature until next month (or next
week, if you read the weekly summaries on perl5-porters)

ithreads is implemented on the interpreter cloning infrastructure originally
added to provide fork emulation on Win32. Part of the design for that is
that under ithreads optrees are read only and shared between threads, to
save the time and memory that would be needed to copy them. For building
without ithreads, the old rules still hold - there is no restriction that
OPs should be read only, and no restriction as to what they can point
to. However, to implement the shared OPs for ithreads required locating all
places where OPs have mutable fields or pointers to structures that are now
per-ithread, and change the code so that when building under ithreads they
move to unshared structures, or otherwise ensure that the OP stays read only
once constructed. To my memory no bugs had cropped up post v5.6.0 relating
to this, so it was assumed that all was fine.

In 2007 I decided to check this assumption by adding the ability to
recompile perl with the OP memory allocations coming from mmap(), and using
mprotect() to turn OPs read only once they had been built. I forget what
even motivated me to do this, but the approach did find a couple more
obscure cases where OPs were being modified at runtime, in violation of the
ithreads rules.

Father Chrysostomos recently refactored OP allocation to always use a slab
allocator, as part of fixing some very long standing bugs to do with OPs
leaking if compilation fails within an eval, and did some further work on
it. Because I was having "fun" trying to work out how Perl_newCONSTSUB(),
PL_curcop and various other things were interacting in reporting warning
filenames and line numbers, I decided to compile with
-DPERL_DEBUG_READONLY_OPS to see if enabling that code would shed any light
on the problem. As a matter of routine, I did this by doing a full build and
test (less than 5 minutes in parallel on reasonable hardware), and noticed
that nearly all of the tests passed in this configuration. So I set off
identifying the cause of failures, to see if it was possible to get it to

It turns out that it was (at least on the x86_64 Linux system I was testing
on), as there were only two underlying causes of failures. Firstly
pp_i_modulo contains runtime code to detect a bug in glibc 2.2.5's _moddi3,
switching in a slower work around implementation if the C library is
buggy. I think that the reasoning for doing this check at runtime, rather
than compile time, is because one is (typically) linking against a shared
library here, and so detecting the problem at build time is potentially
useless - if the system is upgraded to the buggy version, your build time
information that you were safe is now stale, and bugs appear. Meanwhile if
you build when the installed version is buggy, but it's then upgraded to a
fixed version, you don't get the benefit. So when built on platform that is
"at risk", the code does a check on the first call to pp_i_modulo, and then
picks the "right" implementation and rewrites the op to call that directly
in future. "rewrite" - that's a SEGVing offence on a read-only page. So the
simple solution was to disable all the runtime probing if
PERL_DEBUG_READONLY_OPS is defined, effectively treating glibc like every
other platform.

The only other write action on OPs was the debugger setting breakpoints.
When the debugger is enabled, all NEXTSTATE ops are changed at compile time
in DBSTATE ops, and if OPf_SPECIAL is set on a DBSTATE op then a callback is
made into the debugger. Clearly setting or clearing OPf_SPECIAL on an OP at
runtime is a write activity. Given that the debugger itself is aware of
threads, and it is documented that setting a breakpoint applies to all
threads, I decided that the right solution was to explicitly permit this OP
writing, by tweaking the C code to set the OP read/write before altering the
flag, and back to read only afterwards.

With these changes, building with -DPERL_DEBUG_READONLY_OPS (and
-Dusethreads, obviously) passes all tests.

I managed to write tests for the various blocks of code in op.c which calls
CopLINE_set(PL_curcop, oldline), related to generating better warnings from
multi-line constructions. All that is, except this code in newCONSTSUB:

    if (IN_PERL_RUNTIME) {
	/* at runtime, it's not safe to manipulate PL_curcop: it may be
	 * an op shared between threads. Use a non-shared COP for our
	 * dirty work */
	 PL_compiling.cop_warnings = DUP_WARNINGS(PL_curcop->cop_warnings);
	 PL_curcop = &PL_compiling;
    CopLINE_set(PL_curcop, PL_parser ? PL_parser->copline : NOLINE);

This all feels hacky. Why does it need to be set...

So, I think that it contributes to the following bug. Sorry it's not
clear, but note that some of the line numbers in the redefined
warnings *differ* depending on whether it's in a BEGIN block:

$ ./perl -Ilib -we 'eval qq{ {\n\n\nDynaLoader::boot_DynaLoader("DynaLoader")}}; eval qq{{\n\n\nDynaLoader::boot_DynaLoader("DynaLoader")}}'
Subroutine DynaLoader::dl_load_file redefined at (eval 2) line 4.
Subroutine DynaLoader::dl_unload_file redefined at (eval 2) line 4.
Subroutine DynaLoader::dl_find_symbol redefined at (eval 2) line 4.
Subroutine DynaLoader::dl_undef_symbols redefined at (eval 2) line 4.
Subroutine DynaLoader::dl_install_xsub redefined at (eval 2) line 4.
Subroutine DynaLoader::dl_error redefined at (eval 2) line 4.
$ ./perl -Ilib -we 'eval qq{ {\n\n\nDynaLoader::boot_DynaLoader("DynaLoader")}}; eval qq{BEGIN {\n\n\nDynaLoader::boot_DynaLoader("DynaLoader")}}'
Subroutine DynaLoader::dl_load_file redefined at (eval 2) line 1.
Subroutine DynaLoader::dl_unload_file redefined at (eval 2) line 1.
Subroutine DynaLoader::dl_find_symbol redefined at (eval 2) line 1.
Subroutine DynaLoader::dl_undef_symbols redefined at (eval 2) line 1.
Subroutine DynaLoader::dl_install_xsub redefined at (eval 2) line 1.
Subroutine DynaLoader::dl_error redefined at (eval 2) line 1.

ie "line 4" vs "line 1" despite the fact that the only difference between
the two overlong 1-liners is the six character string "BEGIN "

So, I'd like to take the above code out. If I remove it, the build fails:

GLOB_CSH is not a valid File::Glob macro at ../lib/File/ line 66

The problem comes down to this bit of Perl_gv_fetchpvn_flags():

    if (!stash) {
	if (len && isIDFIRST_lazy(name)) {
	    if (global)
		stash = PL_defstash;
	    else if (IN_PERL_COMPILETIME) {
		stash = PL_curstash;
		stash = CopSTASH(PL_curcop);

$expletive. The behaviour of the function Perl_gv_fetchpvn_flags() differs
between "Compile Time" and "Run Time". That's really, um, less than awesome.
(This also isn't the only place deep within a function unrelated to parsing
or optree building that behaviour differs depending on whether
IN_PERL_COMPILETIME is true of false. You can laugh, or you can cry, or
maybe you should do both at the same time.)

Specifically, the way that newCONSTSUB_flags() actually controls how
gv_fetchpvn_flags() gets a stash to default from is by

* setting PL_curstash to the stash to use
* assigning &PL_compiling to PL_curcop to make gv_fetchpvn_flags() notice this.

This is not sane.

I think that the right way to fix this is to have a way to pass the default
stash into Perl_gv_fetchpvn_flags(). Or even split it into two - one half
that locates the stash to use, and the other half that takes a stash, and
does the initialisation. So that code can be changed to something like this:

@@ -1521,7 +1529,9 @@ Perl_gv_fetchpvn_flags(pTHX_ const char *nambeg, STRLEN fu
     if (!stash) {
-       if (len && isIDFIRST_lazy(name)) {
+        if (def_stash) {
+            stash = def_stash;
+        } else if (len && isIDFIRST_lazy(name)) {
            bool global = FALSE;
            switch (len) {

However, it's not at all clear to me how to cleanly get that def_stash into
there. I sent a very ugly proof of concept code to perl5-porters, with which
all tests pass and my convoluted example becomes consistent. But it's a
total bodge, and it's not clear to me (or anyone who has looked at this
previously) what the right way to proceed is. We know where we want to be,
but "If I were you sir, I wouldn't start from here".

I also investigated "microperl". "microperl", like "miniperl", is somewhat
a misnomer. It's not that much smaller:

-rwxr-xr-x 1 nick nick 1091074 Aug 16 21:55 microperl
-rwxr-xr-x 1 nick nick 1223695 Aug 16 15:15 miniperl
-rwxr-xr-x 1 nick nick 1332163 Aug 16 16:03 perl

So what are the differences?

perl is (hopefully obviously) the thing that you want to install. It's
linked with the platform specific dynamic library loading code which
implements DynaLoader, and hence enables perl to load compiled XS code at
runtime. But that dynamic library loading code is written in XS, so needs
a copy of perl to build it. But the build system can't assume that there's
a copy of perl on the system to run this, so how does it bootstrap?

That's the job of miniperl. miniperl is a binary linked from (pretty much)
all the same object files as go up to make perl, but not DynaLoader.o
It's good enough to run xsubpp, the XS to C translator (and the rest of
the build system), but none of the things it needs need perl to build them.
(Because we ship the small number of generated files that need perl to
be recreated, and now have a regression test to ensure that they're kept
up to date). So "perl" is pretty much "miniperl" + DynaLoader.

So where does microperl come in? It's not specifically intended to be
"tiny".  My understanding is that microperl was intended as an experiment as
to whether it's possible to build perl without needing to run some other
tool first to configure it. If you could, you might be able to replace
Configure with some sort of bootstrapping approach using a microperl to
build the configuration for the real perl. That sounds useful. But work on
it pretty much stopped over a decade ago.

Even the "no configuration" idea doesn't really work - you need at least one
canned configuration for ILP32 systems, and one for LP64 systems. (And,
possibly, a third for LLP64 systems, which may just be Win64)

Because microperl doesn't probe features, and builds off a canned,
that has to assume that pretty much everything optional isn't
present. Meaning that if I happen to take the microperl config and graft it
into a regular build, add -DNO_MATHOMS to remove all the legacy support
wrappers, and bodge a couple of things that I can't configure away (yet), I
find that I can get regular perl pretty close to the size of microperl:

-rwxr-xr-x 1 nicholas p5p 1290000 Aug 17 15:17 microperl
-rwxr-xr-x 1 nicholas p5p 1293153 Aug 17 15:09 miniperl
-rwxr-xr-x 1 nicholas p5p 1387582 Aug 17 15:09 perl

microperl is not much different in size from miniperl.

(I don't know why perl is 94429 bytes bigger than miniperl, as DynaLoader.o
is only 9600 bytes, and the other 3 object files that differ between them
only are about 10K larger in total)

Which means that all the special-casing with -DPERL_MICRO and the various
special config files and Makefile *doesn't* actually gain anything
meaningful in size reduction.

As I also can't see anyone looking to replace Configure at this stage in
Perl 5's lifecycle, it's not clear to me that there's any actual case for it.

Given that we've managed to break microperl in two stable releases in the
past 3 years without anyone noticing until some time afterwards, and it
costs us time and effort to maintain it, I'm proposing that we announce in
v5.18.0 that we're planning to eliminate it, and if no-one gives a good use
case as to why to keep it, we cull it before v5.20.0 ships.

And still on the subject of removing things that are no longer used, Ricardo
announced in the v5.16.0 that we plan to clean up the core codebase by
removing code for various platforms for which we have no programmers to
support. (Most likely because the platforms we listed are long dead.)
The list of suspected "special biologist word for stable" platforms is here:

and the plan is to remove one per development release (until the code freeze).

So prior to the v5.17.3 release I removed code relating to UTS. UTS was a
mainframe version of System V created by Amdahl, subsequently sold to UTS
Global. The port has not been touched since before 5.8.0, and UTS Global is
now defunct.

 MANIFEST                      |    5 -
 Porting/ |    2 +-
 README.uts                    |  107 ----------------------
 ext/POSIX/hints/        |    9 --
 handy.h                       |    2 +-
 hints/                  |   32 -------
 perl.h                        |   23 +----
 plan9/mkfile                  |    2 +-
 pod/perl.pod                  |    1 -
 pod/perl58delta.pod           |    4 +-
 pod/perldelta.pod             |    8 +-
 util.c                        |    3 -
 uts/sprintf_wrap.c            |  196 -----------------------------------------
 uts/strtol_wrap.c             |  174 ------------------------------------
 win32/Makefile                |    5 +-
 win32/             |    5 +-
 x2p/a2p.h                     |    8 --
 17 files changed, 17 insertions(+), 569 deletions(-)

There. That was fun. A 0.25% reduction in the line count of the
distribution. What's next?

Well, after Steve Hay shipped v5.17.3, I removed support for VM/ESA. VM/ESA
was a mainframe OS. IBM ended service on it in June 2003. It was superseded
by Z/VM.

 Cross/Makefile-cross-SH       |   3 -
 MANIFEST                      |   6 -
 Makefile.SH                   |   3 -
 Porting/ |   2 +-
 README.bs2000                 |   2 +-
 README.os390                  |   2 +-
 README.vmesa                  | 140 ----------
 ext/DynaLoader/dl_vmesa.xs    | 196 --------------
 ext/Errno/Errno_pm.PL         |   5 +-
 hints/                | 342 ------------------------
 lib/                |   3 +-
 perl.c                        |   4 -
 perl.h                        |   7 +-
 plan9/mkfile                  |   2 +-
 pod/perl.pod                  |   1 -
 pod/perl58delta.pod           |   4 +-
 pod/perldelta.pod             |   9 +-
 pod/perlebcdic.pod            |   4 -
 pod/perlfunc.pod              |   2 +-
 pod/perlport.pod              |  31 +--
 pp_sys.c                      |  12 -
 t/io/pipe.t                   |  10 +-
 t/op/magic.t                  |   2 +-
 thread.h                      |   5 +-
 util.c                        |   6 +-
 vmesa/Makefile                |  15 --
 vmesa/vmesa.c                 | 592 ------------------------------------------
 vmesa/vmesaish.h              |  10 -
 win32/Makefile                |   4 +-
 win32/             |   4 +-
 x2p/a2p.h                     |   2 +-
 31 files changed, 39 insertions(+), 1391 deletions(-)

And that's a further 0.6% reduction. But note how it had its tentacles in
many many places. Surprisingly many. All of which are slightly cleaner now.

"Every little helps", as a certain supermarket round here likes to put it.*

Or, "your platform is at risk if someone doesn't keep up the payments on it"
to paraphrase the small print on mortgages. But, joking aside, there are
only finite people working on the perl core, for a surprisingly small value
of "finite". And as a matter of prioritising limited resources, it's a
no-brainer to concentrate on the things that matter to the vast majority of
people. Code that probably no longer works gets in the way of understanding
the code that works, and so acts as a drag on everything else. So it needs
to go, to help make perl more maintainable in the long term.

Of course, maintainable code that demonstrably does still work is most
welcome (or welcome back) in the core distribution.

A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in
RT #... is a bug in
CPAN #... is a bug in
BBC is "bleadperl breaks CPAN" - Andreas K├Ânig's test reports for CPAN modules
ID YYYYMMDD.### is an bug number in the old bug system. The RT # is given
  afterwards. You can look up the old IDs at

[Hours]		[Activity]
  4.00		-e 'BEGIN {1}'
  0.25		COW
  1.50		CPAN #78624
  0.25		CPAN #78768
  0.25		Cwd.xs
  2.25		File::Find::find and chdir
  0.75		HP-UX make and symlink targets
  0.25		IO-Socket-IP
  0.25		PTR2NV
  0.25		RT #114102
  1.75		RT #114118
  1.50		RT #114174, RT #114176, RT #114194, RT #114296, RT #114532
  0.25		RT #114312
  1.25		RT #114356
  0.25		RT #114372
  1.00		RT #114410
		RT #114424
  0.75		RT #114576
  0.75		RT #114602
  0.50		Remove support for UTS Global.
  1.50		Remove support for VM/ESA
  0.50 (target 'none')
  5.50		bootstrapping Will Braswell
  0.25		cross compiling
  1.50		dl_aix.xs
  0.75		ext/B/t/optree_misc.t
  0.75		hashes
  0.25		investigating security tickets
  1.00		jemalloc
  3.00		microperl
 10.00		newCONSTSUB
  2.75		optimising sub entry
  0.25		perlport
  4.00		process, scalability, mentoring
 31.25		reading/responding to list mail
  6.25		readonly ops
  6.25		smartmatch
		smartmatch, junctions
  0.25		smoke-me branches
  0.50		smoke-me/dynaloader_silence_xs_warning
  0.25		smoke-me/require
  0.50		t/porting/filenames.t
 17.75		warnings from multi-line constructions
113.00 hours total

Nicholas Clark

* And round quite a few places, as it's the world's third largest retailer.

Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About