develooper Front page | perl.perl5.porters | Postings from October 2013

NWCLARK TPF grant September report

Thread Next
From:
Nicholas Clark
Date:
October 3, 2013 15:02
Subject:
NWCLARK TPF grant September report
Message ID:
20131003150230.GQ4940@plum.flirble.org
As per my grant conditions, here is a report for the September period.

Tony's work on RT #115928 revealed something rather strange. The core has
a macro HAS_QUAD, described thus:

    /* HAS_QUAD:
     *      This symbol, if defined, tells that there's a 64-bit integer type,
     *      Quad_t, and its unsigned counterpart, Uquad_t. QUADKIND will be one
     *      of QUAD_IS_INT, QUAD_IS_LONG, QUAD_IS_LONG_LONG, QUAD_IS_INT64_T,
     *      or QUAD_IS___INT64.
     */

However, it turns out that reality doesn't match this description. Whilst
HAS_QUAD (previously QUAD) dates from mid perl-4 times, proper detection of
it via Configure dates from this commit:

    commit de1c2614758a00c0d93fc45414417a54cdf923b3
    Author: Jarkko Hietaniemi <jhi@iki.fi>
    Date:   Sat Oct 30 12:41:50 1999 +0000
    
        Add HAS_QUAD ($Config{d_quad}); use it.
    
        p4raw-id: //depot/cfgperl@4497


In addition to all the logic to set macros appropriately, that commit
also adds the "#undef HAS_QUAD" in this:

    #  if IVSIZE == 8
    #    define IV_IS_QUAD
    #    define UV_IS_QUAD
    #    ifndef HAS_QUAD
    #      define HAS_QUAD
    #    endif
    #  else
    #    undef IV_IS_QUAD
    #    undef UV_IS_QUAD
    #    undef HAS_QUAD
    #  endif

which has the effect of pretending that there isn't actually a "Quad" type
on the platform if you're building with 32 bit IVs. (Which is the default on
a platform with 32 bit longs and pointers.) So that one undef undoes the
good portability work in precisely the place where it would be most useful -
platforms where a 64 bit type is available, but the built perl isn't using
it.

It feels like it's wrong as-is, in that we ought to leave HAS_QUAD defined,
so that XS code can take advantage of 64 bit types if present. As ever, life
is not that simple.

If you simply remove that line on a 32 bit platform built with
-Duse64bitint, the build fails. utf8.h makes the choice between 7 or 13 byte
extended sequences depending on HAS_QUAD, but the "HAS_QUAD" code then
assumes a different macro is defined, which is only defined for 64 bit UVs.

Fix that and the build completes, but tests for sprintf and pack fail. The
printf tests fail because printf's parsing of "long long" formats is
conditional on 64 bit support:

$ perl -MConfig -e 'printf "%d %lld\n", $Config{ivsize}, $Config{uvsize}'
8 8
$ ./perl -Ilib -MConfig -e 'printf "%d %lld\n", $Config{ivsize}, $Config{uvsize\
}'
4 %lld

(which I think is pretty ugly in the first place), and *that* is expressed
(badly) in terms of HAS_QUAD, which is being used as a proxy for "are IVs
64 bits".

The pack failures are more frustrating still. The 'q' and 'Q' pack formats
are fatal errors on a perl with 32 bit IVs:

$ perl -le 'print length pack "q", ~0'
8
$ ./perl -le 'print length pack "q", ~0'
Invalid type 'q' in pack at -e line 1.

but the C code in pp_pack.c (again) is enabled with HAS_QUAD, not a test of
IVSIZE. The code itself is written on the assumption that it will be enabled
with 32 bit IVs, because it falls back to NVs. This works:

#ifdef HAS_QUAD
        case 'q':
            while (len-- > 0) {
                Quad_t aquad;
                SHIFT_VAR(utf8, s, strend, aquad, datumtype, needs_swap);
                if (!checksum)
                    mPUSHs(aquad >= IV_MIN && aquad <= IV_MAX ?
                           newSViv((IV)aquad) : newSVnv((NV)aquad));
                else if (checksum > bits_in_uv)
                    cdouble += (NV)aquad;
                else
                    cuv += aquad;
            }
            break;


The problem is that the pack tests were never written to cope with this, and
fail. They could be fixed. More troubling though, is that t/op/64bitint.t
also fails, because it starts like this:

    eval { my $q = pack "q", 0 };
    skip_all('no 64-bit types') if $@;


It (sadly) seems to be a moderately common idiom on CPAN to determine whether
IVs are 64 bit by using pack 'q' rather than the (more) correct approach of
asking %Config. (Heck, or just comparing ~0 with 0xFFFFFFFF)

http://grep.cpan.me/?q=eval.%2Bpack%5B%5EA-Za-z0-9%5D%2B%5BqQ%5D

(not the best search term, but gets rid of most the false positives. Looks
like at least 10 modules use this approach, and would break)

So I think that 'q' and 'Q' support has to stay coupled to IV size. :-(


With all of those fixed, it's possible to remove that #undef, keep HAS_QUAD
defined, and all the core the test pass. As to CPAN...
 
There are around 20 distributions using HAS_QUAD in ways that might effect
their code:
 
http://grep.cpan.me/?q=HAS_QUAD%5Cb+-file%3Appport.h+-file%3Aconfig_H.gc
 
Most build and pass tests on an unmodified perl. Of those that do, two fail
if HAS_QUAD is enabled, both from the same author. They seem to be carefully
written, with the intent of actually making use of the 64 bit type if
present. One makes the same mistake as the perl core repeats, using #ifdef
HAS_QUAD when it should use #if UVSIZE >= 8.

The other is more involved - it defines a type WTYPE, which is intended to
be the widest type available. Unfortunately, if HAS_QUAD changes to being
defined on an installation with 32 bit UVs, it fails to compile due to a lot
of prototype mismatches, because many functions are declared in the header
as returning UV, but the definition uses WTYPE. Fix all of those, and it
builds, but the tests fail, due (much like t/op/pack.t) to assuming that
Perl-space integer maths can be done to the same precision as C maths.


With those two modules patched, all CPAN modules that build on (unmodified)
blead also build with the change. However, we didn't feel that it was worth
actually making the change, and removing the undef. It would have been 10
years ago, but these days, pretty much everything which wants 64 bit integer
support is simply using long long:
 
http://grep.cpan.me/?q=long+long%5B%5E%27%5D%5B%5E%5Cn%5D*%3B+-file%3Appport.h 
 
(233 distributions found, but a few are false positives) 


So in the end, I merged the fixes, but did not enable HAS_QUAD.

(Which turned out to be useful pretty soon after, as we found it be
necessary to enable HAS_QUAD while building the core C code, to avoid rand()
returning garbage on HP-UX, which, of all things, was triggered when Tony's
work on RT #115928 was merged back.)


The rest of the (partial) month was consumed with a lot of small tasks.

Brian Fraser spent quite a bit of time (and sanity, I fear) figuring out which
uses #define and #ifdef in the code looked like they belonged to the computer
equivalent of the zombie apocalypse, writing up his findings in a message
titled "ifdefs and defines for the glue factory"

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2013-09/msg00061.html

Andy, Merijn, Craig and others made useful comments on macros recognised, but
given the subject matter it's not surprising that most of the research
received no reaction. Hence I spent several hours chasing down all the other
unloved things, figuring out how we should proceed with each, to avoid
Brian's efforts being for nothing.


A rather more pleasing piece of history was revealed by another thread. I
proposed that we deprecate and remove the quicksort implementation, keeping
just the mergesort. Mergesort has been the default sort algorithm since
v5.8.0, and grep.cpan.me suggests no working code on CPAN uses the sort
pragma to select qsort (let alone rely on its behaviour). The consensus was
that qsort can go (although we haven't done this yet). But more interesting
was that the author of our mergesort, John P. Linderman, participated in the
thread. He gave a very interesting description of how the mergesort he wrote
13 years ago is implemented, how it attempts to efficiently spot already
sorted data, and why it's probably actually slightly better than Python's
timsort. See the discussion in RT #119635 for all the details.


The final small thing of note is that the bisect tool now takes
test scripts as targets, and runs them with t/TEST

Previously you had to run it roughly like this

    .../bisect.pl -- sh -c 'cd t && ./perl TEST path/to/test.t'

to make a bisect work, because many of the test scripts exit 0 even if their
TAP reports test failures, whereas git bisect run requires a zero/non-zero
exit to determine "good" or "bad".

Now you run it as

    .../bisect.pl --target path/to/test.t

which is clearly a lot simpler. I implemented it as a target because the
alternative (implementing more special case logic in the handler for general
purpose test cases) looked to be both harder to implement and harder to
document.


A more detailed breakdown summarised from the weekly reports. In these:

16 hex digits refer to commits in http://perl5.git.perl.org/perl.git
RT #... is a bug in https://rt.perl.org/rt3/
CPAN #... is a bug in https://rt.cpan.org/Public/

[Hours]		[Activity]
  0.75		BmUSEFUL
 14.50		HAS_QUAD
  1.25		Peek.t
  0.25		Pod, perltoc
  0.75		RT #114494 (PERL_MICRO)
  2.00		RT #115928
  0.50		RT #116118
  0.50		RT #117265
  1.75		RT #119123
  0.25		RT #119413
  0.25		RT #119599
  1.75		RT #119635
		RT #119635 (qsort)
  0.25		RT #17474
  0.50		RT #43819
  0.25		RT #84486
  1.25		The Unicode bug
  0.50		atoi (match-vars-in-mg_len)
  2.75		bisect.pl
  2.75		defines for the glue factory
  0.50		dl_dld.xs
  0.25		process, scalability, mentoring
  6.75		reading/responding to list mail
  0.25		runperl
  0.25		smoke-me branches
  0.25		sort.pm RT #88546, RT #88547
  0.50		t/re/reg_mesg.t
======
 41.50 hours total


And with that, I calculate that I've exhausted the grant's hours.
It's been an interesting two years, but for me it's time for a change.

Nicholas Clark

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About