develooper Front page | perl.perl5.porters | Postings from May 2012

NWCLARK TPF grant report #36

Nicholas Clark
May 14, 2012 08:08
NWCLARK TPF grant report #36
Message ID:
[Hours]		[Activity]
2012/05/07	Monday
 0.25		ID 20010903.004 (#7614)
 2.00		mktables memory usage

2012/05/08	Tuesday
 1.50		ID 20000509.001 (#3221)
 0.25		ID 20010218.002 (#5844)
 1.50		ID 20010305.011 (#5971)
 0.25		RT #112126
 0.50		RT #112786
 0.75		RT #112820
 0.25		RT #47027
 3.00		reading/responding to list mail

2012/05/10	Thursday
 1.00		RT #108286
 0.50		RT #94682
 3.50		reading/responding to list mail
 1.50		smoke-me branches

2012/05/11	Friday
 0.75		RT #112792
 0.50		RT #112866
 1.00		clarifying the build system
 2.75		reading/responding to list mail
 1.00		undefined behaviour from integer overflow

Which I calculate is 22.75 hours

Less happens when you work fewer days, hence fewer notable things to report.

RT #108286 has spawned several subthreads. One of them relates to the
observation that code written like this, with each in the condition of a
while loop:

    while ($var = each %hash) { ... }
    while ($_ = each %hash) { ... }

actually has a defined check automatically added, eg

    $ perl -MO=Deparse -e 'while ($_ = each %hash) { ... }'
    while (defined($_ = each %hash)) {
        die 'Unimplemented';
    -e syntax OK

whereas code that omits the assignment does not have defined added:

    $ perl -MO=Deparse -e 'while (each %hash) { ... }'
    while (each %hash) {
        die 'Unimplemented';
    -e syntax OK

contrast with (say) readdir, where defined is added, and an assignment to

    $ perl -MO=Deparse -e 'while ($var = readdir D) { ... }'
    while (defined($var = readdir D)) {
        die 'Unimplemented';
    -e syntax OK
    $ perl -MO=Deparse -e 'while (readdir D) { ... }'
    while (defined($_ = readdir D)) {
        die 'Unimplemented';
    -e syntax OK

Note, this is only for readdir in the condition of a while loop - it doesn't
usually default to assigning to $_

So, is this intended, or is it a bug? And if it's a bug, should it be fixed.

Turns out that the answer is, well, involved.

The trail starts with a ruling from Larry back in 1998:

    As usual, when there are long arguments, there are good arguments for both
    sides (mixed in with the chaff).  In this case, let's make
        while ($x = <whatever>)
    equivalent to
        while (defined($x = <whatever>))
    (But nothing more complicated than an assignment should assume defined().)

Nick Ing-Simmons asks for a clarification:

    Thanks Larry - that is what the patch I posted does.
    But it also does the same for C<readdir>, C<each> and C<glob> - 
    i.e. the same cases that solicit the warning in 5.004 is extending
    the defined insertion to those cases desirable?
    (glob and readdir seem to make sense, I am less sure about each).

(it's clarified in a later message that Nick I-S hadn't realised that each
in *scalar* context returns the keys, so it's an analogous iterator which
can't return undef for any entry)

In turn, the "RULING" dates back to a thread discussing/complaining about
a warning added in added in 5.004

    $ perl5.004 -cwe 'while ($a = <>) {}'
    Value of <HANDLE> construct can be "0"; test with defined() at -e line 1.
    -e syntax OK

The intent of the changes back then appears to be to retain the 5.003 and
earlier behaviour on what gets assigned for each construction, but change
the loop behaviour to terminate on undefined rather than simply falsehood
for the common simple cases:

    while (OP ...)


    while ($var = OP ...)

And there I thought it made sense - fixed in 1998 for readline, glob and
readdir, but introducing the inconsistency because each doesn't default
to assigning to $_. Except, it turned out that there was a twist in the
tail. It turns out that while (readdir D) {...} didn't use to implicitly
assign to $_. Both the implicit assignment to $_ and defined test were added
in *2009* by commit 114c60ecb1f7, without any fanfare, just like any other
bugfix. And the world hasn't ended.

    $ perl5.10.0 -MO=Deparse -e 'while (readdir D) {}'
    while (readdir D) {
    -e syntax OK
    $ perl5.12 -MO=Deparse -e 'while (readdir D) {}'
    while (defined($_ = readdir D)) {
    -e syntax OK

Running a search of CPAN reveals that almost no code uses while (each %hash)
[and why should it? The construction does a lot of work only to throw it
away], and *nothing* should break if it's changed, so it makes sense to
treat this as a bug, and fix it. (Post 5.16.0, obviously.)

To conclude this story, the mail archives from 15 years ago are fascinating.
Lots of messages. Lots of design discussions, not always helpful. And some
of the same unanswered questions as today.

The other tale of note relates to a digression from a bug. In trying to
replicate a previous old bug (ID 20010918.001, now #7698) I'd dug an old
machine with FreeBSD 4.6 out from the cupboard under the stairs in the hope
of reproducing the period problem with a period OS. Sadly I couldn't do
that, but out of curiosity I tried to build blead on it. This is the same
16M machine whose swapping hell prompted my investigation of enc2xs the
better part of a decade ago, resulting in various optimisations on its build
time memory use, that in turn led to ways to roughly halve the side of the
built shared objects, and a lot of the material then used in a tutorial I
presented at YAPC::Europe and The German Perl Workshop, "When Perl is not
quite fast enough". This machine has pedigree.

Once again, it descended into swap hell, this time on mktables. (And with
swap on all 4 hard disks, it's very effective at letting you know that it's
swapping.) Sadly after 10 hours, and seemingly nearly finished, it ran out
of virtual memory. So I wondered if, like last time, I could get the memory
usage down. After a couple of false starts I found a tweak to Perl_sv_grow
that gave a 2.5% memory reduction on FreeBSD (but none on Linux), but that
wasn't enough. However, the cleanly abstracted internal structure of
mktables makes it easy to add code to count the memory usage of the various
data structures it generate. One of its low-level types is "Range", which
subdivides into "special" and "non-special". There are 368676 of the latter,
and the name for each may be need to be normalised into a "standard
form". The code was taking the approach of calculating the standard form at
object creation time. With the current usage patterns of the code, this
turns out to be less than awesome - the standard form is only requested for
22047 of them. By changing the code to calculate only when needed (and cache
the result) I reduced RAM and CPU usage by about 10% on Linux, and 6% on
FreeBSD. Whilst the latter is smaller, it was enough to get the build
through mktables, and on to completion. The refactoring is in
nicholas/build-trim, pending review and merging post 5.16.0.

To complete the story, I should note that make harness failed with about 100
tests still to run, snatching defeat from the jaws of victory. Turns out
that *that* also chews a lot of memory to store test results. make test,
however, did pass (except for one bug in t/op/sprintf.t, patch in RT
@112820). Curiously gcc, even when optimising, isn't the biggest memory hog
of the build. It's beaten by mktables, t/harness and a couple of the Unicode
regression tests. But even then, our build is very frugal. It should
complete just fine with 128M of VM on a 32 bit FreeBSD system, and I'd guess
under 256M on Linux (different malloc, different trade offs).  I think that
this means that blead would probably build and test OK within the hardware
of a typical smartphone (without swapping), if they actually had native
toolchains. Which they don't. Shame :-(

Nicholas Clark Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About