develooper Front page | perl.perl5.porters | Postings from December 2015

Data::Dumper and large integers

From:
Aaron Crane
Date:
December 30, 2015 15:14
Subject:
Data::Dumper and large integers
Message ID:
CACmk_tux6sxhBMnCR8VPDL44atEyQBB-N7r1KNim5znGj00MtQ@mail.gmail.com
TL;DR: I want to change Data::Dumper so that there are more cases in
which it avoids wrapping quotes around integers.

Consider this simple program:

use Config;
use Data::Dumper;
print for "$Config{ivsize}\n",
    Dumper(2_000_000_000, 4_000_000_000, 5_000_000_000);

On a 32-bit system it produces this output:

4
$VAR1 = 2000000000;
$VAR2 = 4000000000;
$VAR3 = '5000000000';

This isn't terribly surprising: $VAR1 is IOK, $VAR2 is UOK, and $VAR3
is NOK; and Data::Dumper always uses quotes to emit an NOK-only
scalar, but IOK and UOK scalars can be emitted as plain integers.

There is a surprise to come on a 64-bit system, though:

8
$VAR1 = 2000000000;
$VAR2 = 4000000000;
$VAR3 = 5000000000;

That's because all these numbers fit into an IV on such a platform, so
the values are all IOK, and so $VAR3 gets no quotes.

However, even a 64-bit platform uses quotes for an integer that needs
eleven digits:

$ perl -MData::Dumper -MConfig -e \
  'print for "$Config{ivsize}\n", Dumper(10_000_000_000)'
8
$VAR1 = '10000000000';

That's caused by a specific imposition of a ten-digit limit in the XS
implementation. (Well, a ten-character limit, strictly: the minus sign
for a negative number counts against the limit too, so all integers
-1e9 and below get quoted.)

Also, the pure-Perl implementation quotes all these numbers on all
platforms. Specifically, it uses quotes for all integers outside the
range -999_999_999 .. 999_999_999 (by matching against a suitable
regex).

As far as I can tell, these differences (both between 32-bit and
64-bit systems, and XS and pure-Perl implementations) aren't entirely
deliberate. One comment in Dumper.xs points out that "the pure perl
and XS non-qq outputs have historically been different". Another
comment, on the code that applies the ten-digit limit, says "Looks
like we're on a 64 bit system. Make it a string so that if a 32 bit
system reads the number it will cope better." The patch that
introduced that comment is here:

http://www.nntp.perl.org/group/perl.perl5.porters/2002/03/msg54165.html

which also expands on the issue it's seeking to fix:

> On 64 bit perls XS code would dump very large integers as numbers.
> If fed to 32 bit perls these will immediately be treated as floating
> point, which will cause digits to be lost. Now they are dumped as strings,
> which will preserve digits in a 32 bit perl that uses them as a string.

But there are still many values that would be happily emitted without
quotes on a 64-bit system, but would be treated as floats on a 32-bit
system — every integer in the half-open range [2**32 .. 10e9). So
Data::Dumper definitely doesn't have the property that output
generated on a 64-bit Perl can be losslessly evaluated by a 32-bit
Perl (and that's been true for well over a decade).

I also think it's unlikely that downstream users have tight coupling
on the precise output DD generates for large integers, given the
existing differences caused by both integer size and implementation
selection.

I therefore propose to change Data::Dumper to emit all integers[*] in
the range IV_MIN .. UV_MAX without quotes, in both the PP and XS
implementations, even though the values of IV_MIN and UV_MAX are
platform- and configuration-dependent.

Any objections?

[*] Well, almost. When DD acquired an XS implementation of $Useqq,
some minor output changes did cause some BBC failures, and therefore
those differences were stamped out:

https://rt.perl.org/Public/Bug/Display.html?id=118933

So I propose to leave the output under $Useqq unchanged.

-- 
Aaron Crane ** http://aaroncrane.co.uk/



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About