develooper Front page | perl.perl5.porters | Postings from June 2021

Prospective RFC-002 - Interpolate NVs to Decimal Strings Correctlyand Concisely

Thread Next
From:
sisyphus
Date:
June 20, 2021 12:03
Subject:
Prospective RFC-002 - Interpolate NVs to Decimal Strings Correctlyand Concisely
Message ID:
CADZSBj0oLzWQ=s0jKPLz2ti6fG46hkk+NbXPj6YcKd3rWk00Ow@mail.gmail.com

Based on https://github.com/Perl/RFCs/blob/master/docs/template.md
------------------------------------------------------------------
=====
Title
=====

Interpolate NVs to Decimal Strings Correctly and Concisely

========
Preamble
========

Author:   Sisyphus <SISYPHUS>
Sponsor:  Nicholas Clark <NWCLARK>
ID:       0002
Status:   Exploratory

========
Abstract
========

Alter the way that NVs are interpolated into decimal strings such that these
decimal strings:
1) preserve information;
 && 
2) use as few significant digits as possible;
 &&
3) are rounded to nearest, ties to even.

==========
Motivation
==========

Preservation of information requires that, for a scalar ($nv) containing a
floating point value, the interpolated decimal string "$nv" must contain enough
information such that the original value (of $nv) can be ascertained from that
decimal string.
This is often NOT the case with perl5 - when the decimal string "$nv" has
lost information, owing to provision of insufficient decimal precision.

A second requirement is that this decimal string "$nv" should also comprise the
least number of significant digits possible.

Designating this "least number of significant digits possible" as p, then our
third requirement is that our p-significant-digit decimal string "$nv" be the
p-significant-digit decimal string that is closest to $nv.
(Sometimes there can be more than one p-significant-digit string that assigns
to the same NV. See Example 4 in the "Examples" section below.)

Making these changes would bring the output decimal strings provided by Perl's
print() function into line with those provided by Python3 and Raku (and probably
some other languages, too).

To me, the possibility that $nv != "$nv" (for non-NaN $nv) is absurd and
avoidable - and should not be tolerated.
And yet, this is precisely what we have tolerated in perl5 for many years.

The proposal of this RFC is to amend this situation.

=========
Rationale
=========

Perl5 essentially interpolates NVs into decimal strings by doing:

sprintf "%.${prec}g", $nv;

where $prec is either 15 (when nvsize == 8), 18 (when NV is 80-bit extended
precision long double), or 33 (when NV is either __float128 or IEEE-754 16-byte
long double).

For many non-NaN values of $nv, the condition "$nv" == $nv is FALSE, even though
the string "$nv" is being assigned correctly - ie. information is being lost 
when $nv is interpolated to a decimal string.

For those cases we could solve this issue of "preservation of information" by
simply increasing the values for $prec to 17, 21, 36 (respectively), but that
would not respect the condition of "least possible number of digits".
We would start seeing strings like "0.10000000000000001" when "0.1" is
sufficient to preserve the information.
So we reject this solution of using sprintf() with a larger value for $prec on
the grounds that it fails to always comply with the condition that the fewest
number of digits possible be used.

Candidates that allow for the behaviour being sought include Ryu [1][2],
Dragon4 [3] and Grisu3 [4][5].
Are there other candidates that should be considered ?

I have reservations about using Grisu3 because, for NV type of 'double', it
only covers about 99.5% of possible values - and a fallback for the
remaining 0.5% is therefore needed.

I have reservations about a dragon type implementation, including Dragon4,
because:
1) it requires arbitrary precision integer operations (which, I believe,
   would create significant difficulties regarding its inclusion in
   the perl source);
2) it is reportedly slower than both Grisu3 and Ryu.

I'm leaning towards Ryu - but that is based mainly upon what I've heard and
read about it.

I recently tried following the build instructions in the README.md to build Ryu
from its github source [2] on Ubuntu, but that failed.
I did create libryu.a using mingw ... but then couldn't readily see how I was
supposed to utilize it, or even if it was intended that it be utilized directly.

According to the README.md from the Ryu github repo[2], Ryu accommodates all of
perl5's commonly supported NV types.
However, I doubt that it will adapt readily to the very uncommon double-double
NV type - for which I think a dragon-type implementation might be the only
option.
Of course, the double-double is so rarely encountered that providing a fix for
that type of NV can, I suggest, be deemed low priority.

=============
Specification
=============

NVs are interpolated into decimal strings such that:
1) the precise value of the NV can be deduced from the decimal string;
2) this decimal string comprise of no more significant digits than are needed
   to make that first condition hold;
3) if there is more than one such string to choose from, then the one that is
   nearest to $nv (ties to even) is the one that is used.

=======================
Backwards Compatibility
=======================

With the new behaviour, the change in the interpolation will certainly be
noticeable.
At present, where "print sqrt(2)" outputs 1.4142135623731, it
would, under the proposed changes, output 1.4142135623730951.

This could certainly have ramifications for any code that relies on the way
that NVs are stringified.
 
For example, with List::Util on a perl whose $Config{nvsize} == 8, we would
currently see:

$ perl -MList::Util -E 'say List::Util::uniqstr("1.4142135623731",sqrt 2);'
1

With the proposed changes, the stringification of sqrt(2) changes.
And what are currently two identical strings become 2 different strings.
The output of that one liner therefore changes to 2.

=====================
Security Implications
=====================

??

========
Examples
========

These examples are as run on perl-5.34.0, configured with
$Config{nvtype} of 'double'.
The same types of issues arise with the other $Config{nvtype}
values, too - though the details will differ.

Example 1:
----------
$ perl -wle '$nv = sqrt(2); print "$nv" unless "$nv" == $nv;'
1.4142135623731

Here, we see that the condition "$nv" == $nv is FALSE, because the string
"1.4142135623731", correctly assigns to an NV that is different to $nv.
For that condition to be true, $nv needs to be  stringified to
the 17 decimal digit number "1.4142135623730951".

$ perl -wle 'print "ok" if "1.4142135623730951" == sqrt(2);'
ok

Example 2:
----------
But if we were to insist that all NVs on this perl5 configuration
be stringified to 17 decimal digits then we get:

$ perl -wle '$s = sprintf "%.17g", 0.1;print "$s" if "$s" == $0.1;'
0.10000000000000001

Yet we know that "0.1" would suffice:
$ perl -wle 'print "ok" if "0.1" == 0.1;'
ok

Hence we see that the condition that the decimal string "$nv" comprise of the
least number of digits possible, is not being met.
What we want is a process that will have sqrt(2) to be interpolated to 17
significant decimal digits, but will also have  0.1 be interpolated to 1
significant digit.

With Ryu we can stringify $nv such that "$nv" + 0 has the
same value as $nv, whilst ensuring that "$nv" comprises the fewest significant
decimal digits possible.

Perl5 also fails to preserve information with divisions.
Example 3:
----------

$ perl -wle '$nv = 1.4 / 10; print "$nv" unless "$nv" == $nv;'
0.14

Yes, 0.14 is not equivalent to 1.4 / 10.
We can see that best by looking at the respective hex representations:

$ perl -wle 'printf "%a\n", 0.14;'
0x1.1eb851eb851ecp-3
$ perl -wle 'printf "%a\n", 1.4 / 10;'
0x1.1eb851eb851ebp-3

The correct interpolation for 1.4/10 is "0.13999999999999999"which is what Ryu
will deliver.

Example 4:
----------

The third condition that I gave in the "Abstract" above was
that the interpolated "decimal string is correctly rounded".
Consider the NV 2**-1074. For that value Perl5 currently elicits:

C:\>perl -le "print 2**-1074"
4.94065645841247e-324

It so happens that, for $Config{nvsize} == 8, the strings "3e-324", "4e-324",
"5e-324", "6e-324" and "7e-324" are all equivalent to 2**-1074.
So we have 5 strings to choose from - each of them has the same number of
digits, and each of them preserves the value 2**-1074 when assigned to an NV.

In terms of the first 2 conditions, we could choose either of them.
It is the third condition that specifies that we should select the one that is
closest to 2**-1074.
The closest is "5e-324", which is what Ryu will select.

Here is a script that demos an annoyance I've struck with perl5 and Test::More:

Example 5:
---------

use Test::More tests => 1;
$x = 1.4 / 10;
cmp_ok("$x", '==', 0.14, '1.4/10 == 0.14');

As we've just seen, that test will fail, and that script outputs:

1..1
not ok 1 - 1.4/10 == 0.14
#   Failed test '1.4/10 == 0.14'
#   at try.pl line 2.
#          got: 0.14
#     expected: 0.14
# Looks like you failed 1 test of 1.

It's implying that the script has failed because 0.14 != 0.14.
That's obviously rubbish, and not at all helpful.
Under the proposed change, that script would output:

1..1
not ok 1 - 1.4/10 == 0.14
#   Failed test '1.4/10 == 0.14'
#   at try.pl line 2.
#          got: 0.13999999999999999
#     expected: 0.14
# Looks like you failed 1 test of 1.

========================
Prototype Implementation
========================

In Math::MPFR (on cpan) there's an implementation of Grisu3 [5]
called doubletoa() - but it's only available for perls whose
$Config{nvsize} == 8.
(Note that Math::MPFR depends upon both gmp and mpfr C libraries.)

Grisu3 fails to derive the strings for about 0.5% of doubles.
When that happens, doubletoa() falls back to a dragon-type implementation.
The result is that doubletoa() returns the same string for the given
argument as would be derived using Ryu.

$ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(sqrt 2);"
1.4142135623730951
$ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(0.1);"
0.1
$ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(1.4 / 10);"
0.13999999999999999

For NVs of ALL sizes and types, we can see what use of Ryu would produce
by using the dragon-type implementation in Math::MPFR called nvtoa().
On a perl-5.34.0 where $Config{nvtype} is __float128:

$ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(sqrt 2);"
1.4142135623730950488016887242096982
$ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(0.1);"
0.1
$ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(1.4/10);"
0.13999999999999999999999999999999999

The nvtoa() function works with all of the various nvtypes, including
double-double (but not the middle-endian ones, though this shortcoming
should be easily rectifiable).

However, this dragon-type implementation is not Dragon4. It's actually
based on Tables 3 and 4 in the Dragon4 paper[3]. Dragon4 itself is set
out in Tables 5 to 13 of the same document.

============
Future Scope
============

Cover all NV types, assuming that this is not achieved to begin with.

==============
Rejected Ideas
==============

See the "Rationale" section above.
At this stage I'm rejecting only sprintf().
I see Grisu3 as unlikely to be the best candidate because of its
deficiency in coverage.
I think Ryu will prove to be the best candidate - but let's see what
others think.

===========
Open Issues
===========

Issue 1:
-------
I should point out that I doubt my ability to implant this proposed change
in perl's behaviour into the the perl CORE.
I guess this means that, if this RFC proposal is accepted, one fairly
obvious "Open Issue" is:
Who is going to implement it ?

Issue 2:
-------
Do we need to consider the possibility that a perl5 build might
use a rounding mode other than "round to nearest, ties to even" ?
Ryu claims to be able to handle all of the usual rounding modes, anyway.
The dragon types can also handle the other rounding modes.

Issue 3:
-------
There's also the issue of how to format our interpolated decimal strings.
AIUI, Ryu, Grisu and Dragon all create their results as an integer string
and exponent pair - from which we can create our chosen formatting, be it
1501, or 1501.0,or 1.501e3, for example.
I suppose we can just follow perl5's existing formatting rules ... or
change them, if we so desire.
For the doubletoa() and nvtoa() functions I mentioned in the "Prototype
Implementation" section, I've tried to structure the formatting to match
python3, which is not identical to perl5's current formatting practices.

Issue 4:
-------
The double-double nvtypes can accommodate some (though not all) values
up to a precision of 2098 bits.
I'd therefore be surprised if Ryu is going to handle them readily.
Dragon4 could handle this type of NV (as the Math::MPFR nvtoa function
already does) - albeit at the cost of some beefy arbitrary precision
integer calculations.

=========
Copyright
=========

??

==========
References
==========

[1] "Ryu: Fast Float-to-String Conversion" - Ulf Adams
    https://dl.acm.org/doi/pdf/10.1145/3192366.3192369

[2] Ryu github repo:
    https://github.com/ulfjack/ryu

[3] "How to Print Floating-Point Numbers Accurately" - Guy L. Steele Jr. & Jon L. White
    https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf

[4] "Printing Floating-Point Numbers Quickly and Accurately with Integers" - Florian Loitsch
    https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

[5] Grisu3 implementation in C:
    https://github.com/juj/MathGeoLib/blob/master/src/Math/grisu3.c





Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About