From:

Date:

June 20, 2021 12:03Subject:

Prospective RFC-002 - Interpolate NVs to Decimal Strings Correctlyand ConciselyMessage ID:

CADZSBj0oLzWQ=s0jKPLz2ti6fG46hkk+NbXPj6YcKd3rWk00Ow@mail.gmail.comBased on https://github.com/Perl/RFCs/blob/master/docs/template.md ------------------------------------------------------------------ ===== Title ===== Interpolate NVs to Decimal Strings Correctly and Concisely ======== Preamble ======== Author: Sisyphus <SISYPHUS> Sponsor: Nicholas Clark <NWCLARK> ID: 0002 Status: Exploratory ======== Abstract ======== Alter the way that NVs are interpolated into decimal strings such that these decimal strings: 1) preserve information; && 2) use as few significant digits as possible; && 3) are rounded to nearest, ties to even. ========== Motivation ========== Preservation of information requires that, for a scalar ($nv) containing a floating point value, the interpolated decimal string "$nv" must contain enough information such that the original value (of $nv) can be ascertained from that decimal string. This is often NOT the case with perl5 - when the decimal string "$nv" has lost information, owing to provision of insufficient decimal precision. A second requirement is that this decimal string "$nv" should also comprise the least number of significant digits possible. Designating this "least number of significant digits possible" as p, then our third requirement is that our p-significant-digit decimal string "$nv" be the p-significant-digit decimal string that is closest to $nv. (Sometimes there can be more than one p-significant-digit string that assigns to the same NV. See Example 4 in the "Examples" section below.) Making these changes would bring the output decimal strings provided by Perl's print() function into line with those provided by Python3 and Raku (and probably some other languages, too). To me, the possibility that $nv != "$nv" (for non-NaN $nv) is absurd and avoidable - and should not be tolerated. And yet, this is precisely what we have tolerated in perl5 for many years. The proposal of this RFC is to amend this situation. ========= Rationale ========= Perl5 essentially interpolates NVs into decimal strings by doing: sprintf "%.${prec}g", $nv; where $prec is either 15 (when nvsize == 8), 18 (when NV is 80-bit extended precision long double), or 33 (when NV is either __float128 or IEEE-754 16-byte long double). For many non-NaN values of $nv, the condition "$nv" == $nv is FALSE, even though the string "$nv" is being assigned correctly - ie. information is being lost when $nv is interpolated to a decimal string. For those cases we could solve this issue of "preservation of information" by simply increasing the values for $prec to 17, 21, 36 (respectively), but that would not respect the condition of "least possible number of digits". We would start seeing strings like "0.10000000000000001" when "0.1" is sufficient to preserve the information. So we reject this solution of using sprintf() with a larger value for $prec on the grounds that it fails to always comply with the condition that the fewest number of digits possible be used. Candidates that allow for the behaviour being sought include Ryu [1][2], Dragon4 [3] and Grisu3 [4][5]. Are there other candidates that should be considered ? I have reservations about using Grisu3 because, for NV type of 'double', it only covers about 99.5% of possible values - and a fallback for the remaining 0.5% is therefore needed. I have reservations about a dragon type implementation, including Dragon4, because: 1) it requires arbitrary precision integer operations (which, I believe, would create significant difficulties regarding its inclusion in the perl source); 2) it is reportedly slower than both Grisu3 and Ryu. I'm leaning towards Ryu - but that is based mainly upon what I've heard and read about it. I recently tried following the build instructions in the README.md to build Ryu from its github source [2] on Ubuntu, but that failed. I did create libryu.a using mingw ... but then couldn't readily see how I was supposed to utilize it, or even if it was intended that it be utilized directly. According to the README.md from the Ryu github repo[2], Ryu accommodates all of perl5's commonly supported NV types. However, I doubt that it will adapt readily to the very uncommon double-double NV type - for which I think a dragon-type implementation might be the only option. Of course, the double-double is so rarely encountered that providing a fix for that type of NV can, I suggest, be deemed low priority. ============= Specification ============= NVs are interpolated into decimal strings such that: 1) the precise value of the NV can be deduced from the decimal string; 2) this decimal string comprise of no more significant digits than are needed to make that first condition hold; 3) if there is more than one such string to choose from, then the one that is nearest to $nv (ties to even) is the one that is used. ======================= Backwards Compatibility ======================= With the new behaviour, the change in the interpolation will certainly be noticeable. At present, where "print sqrt(2)" outputs 1.4142135623731, it would, under the proposed changes, output 1.4142135623730951. This could certainly have ramifications for any code that relies on the way that NVs are stringified. For example, with List::Util on a perl whose $Config{nvsize} == 8, we would currently see: $ perl -MList::Util -E 'say List::Util::uniqstr("1.4142135623731",sqrt 2);' 1 With the proposed changes, the stringification of sqrt(2) changes. And what are currently two identical strings become 2 different strings. The output of that one liner therefore changes to 2. ===================== Security Implications ===================== ?? ======== Examples ======== These examples are as run on perl-5.34.0, configured with $Config{nvtype} of 'double'. The same types of issues arise with the other $Config{nvtype} values, too - though the details will differ. Example 1: ---------- $ perl -wle '$nv = sqrt(2); print "$nv" unless "$nv" == $nv;' 1.4142135623731 Here, we see that the condition "$nv" == $nv is FALSE, because the string "1.4142135623731", correctly assigns to an NV that is different to $nv. For that condition to be true, $nv needs to be stringified to the 17 decimal digit number "1.4142135623730951". $ perl -wle 'print "ok" if "1.4142135623730951" == sqrt(2);' ok Example 2: ---------- But if we were to insist that all NVs on this perl5 configuration be stringified to 17 decimal digits then we get: $ perl -wle '$s = sprintf "%.17g", 0.1;print "$s" if "$s" == $0.1;' 0.10000000000000001 Yet we know that "0.1" would suffice: $ perl -wle 'print "ok" if "0.1" == 0.1;' ok Hence we see that the condition that the decimal string "$nv" comprise of the least number of digits possible, is not being met. What we want is a process that will have sqrt(2) to be interpolated to 17 significant decimal digits, but will also have 0.1 be interpolated to 1 significant digit. With Ryu we can stringify $nv such that "$nv" + 0 has the same value as $nv, whilst ensuring that "$nv" comprises the fewest significant decimal digits possible. Perl5 also fails to preserve information with divisions. Example 3: ---------- $ perl -wle '$nv = 1.4 / 10; print "$nv" unless "$nv" == $nv;' 0.14 Yes, 0.14 is not equivalent to 1.4 / 10. We can see that best by looking at the respective hex representations: $ perl -wle 'printf "%a\n", 0.14;' 0x1.1eb851eb851ecp-3 $ perl -wle 'printf "%a\n", 1.4 / 10;' 0x1.1eb851eb851ebp-3 The correct interpolation for 1.4/10 is "0.13999999999999999"which is what Ryu will deliver. Example 4: ---------- The third condition that I gave in the "Abstract" above was that the interpolated "decimal string is correctly rounded". Consider the NV 2**-1074. For that value Perl5 currently elicits: C:\>perl -le "print 2**-1074" 4.94065645841247e-324 It so happens that, for $Config{nvsize} == 8, the strings "3e-324", "4e-324", "5e-324", "6e-324" and "7e-324" are all equivalent to 2**-1074. So we have 5 strings to choose from - each of them has the same number of digits, and each of them preserves the value 2**-1074 when assigned to an NV. In terms of the first 2 conditions, we could choose either of them. It is the third condition that specifies that we should select the one that is closest to 2**-1074. The closest is "5e-324", which is what Ryu will select. Here is a script that demos an annoyance I've struck with perl5 and Test::More: Example 5: --------- use Test::More tests => 1; $x = 1.4 / 10; cmp_ok("$x", '==', 0.14, '1.4/10 == 0.14'); As we've just seen, that test will fail, and that script outputs: 1..1 not ok 1 - 1.4/10 == 0.14 # Failed test '1.4/10 == 0.14' # at try.pl line 2. # got: 0.14 # expected: 0.14 # Looks like you failed 1 test of 1. It's implying that the script has failed because 0.14 != 0.14. That's obviously rubbish, and not at all helpful. Under the proposed change, that script would output: 1..1 not ok 1 - 1.4/10 == 0.14 # Failed test '1.4/10 == 0.14' # at try.pl line 2. # got: 0.13999999999999999 # expected: 0.14 # Looks like you failed 1 test of 1. ======================== Prototype Implementation ======================== In Math::MPFR (on cpan) there's an implementation of Grisu3 [5] called doubletoa() - but it's only available for perls whose $Config{nvsize} == 8. (Note that Math::MPFR depends upon both gmp and mpfr C libraries.) Grisu3 fails to derive the strings for about 0.5% of doubles. When that happens, doubletoa() falls back to a dragon-type implementation. The result is that doubletoa() returns the same string for the given argument as would be derived using Ryu. $ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(sqrt 2);" 1.4142135623730951 $ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(0.1);" 0.1 $ perl -MMath::MPFR=":mpfr" -wle "print doubletoa(1.4 / 10);" 0.13999999999999999 For NVs of ALL sizes and types, we can see what use of Ryu would produce by using the dragon-type implementation in Math::MPFR called nvtoa(). On a perl-5.34.0 where $Config{nvtype} is __float128: $ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(sqrt 2);" 1.4142135623730950488016887242096982 $ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(0.1);" 0.1 $ perl -MMath::MPFR=":mpfr" -wle "print nvtoa(1.4/10);" 0.13999999999999999999999999999999999 The nvtoa() function works with all of the various nvtypes, including double-double (but not the middle-endian ones, though this shortcoming should be easily rectifiable). However, this dragon-type implementation is not Dragon4. It's actually based on Tables 3 and 4 in the Dragon4 paper[3]. Dragon4 itself is set out in Tables 5 to 13 of the same document. ============ Future Scope ============ Cover all NV types, assuming that this is not achieved to begin with. ============== Rejected Ideas ============== See the "Rationale" section above. At this stage I'm rejecting only sprintf(). I see Grisu3 as unlikely to be the best candidate because of its deficiency in coverage. I think Ryu will prove to be the best candidate - but let's see what others think. =========== Open Issues =========== Issue 1: ------- I should point out that I doubt my ability to implant this proposed change in perl's behaviour into the the perl CORE. I guess this means that, if this RFC proposal is accepted, one fairly obvious "Open Issue" is: Who is going to implement it ? Issue 2: ------- Do we need to consider the possibility that a perl5 build might use a rounding mode other than "round to nearest, ties to even" ? Ryu claims to be able to handle all of the usual rounding modes, anyway. The dragon types can also handle the other rounding modes. Issue 3: ------- There's also the issue of how to format our interpolated decimal strings. AIUI, Ryu, Grisu and Dragon all create their results as an integer string and exponent pair - from which we can create our chosen formatting, be it 1501, or 1501.0,or 1.501e3, for example. I suppose we can just follow perl5's existing formatting rules ... or change them, if we so desire. For the doubletoa() and nvtoa() functions I mentioned in the "Prototype Implementation" section, I've tried to structure the formatting to match python3, which is not identical to perl5's current formatting practices. Issue 4: ------- The double-double nvtypes can accommodate some (though not all) values up to a precision of 2098 bits. I'd therefore be surprised if Ryu is going to handle them readily. Dragon4 could handle this type of NV (as the Math::MPFR nvtoa function already does) - albeit at the cost of some beefy arbitrary precision integer calculations. ========= Copyright ========= ?? ========== References ========== [1] "Ryu: Fast Float-to-String Conversion" - Ulf Adams https://dl.acm.org/doi/pdf/10.1145/3192366.3192369 [2] Ryu github repo: https://github.com/ulfjack/ryu [3] "How to Print Floating-Point Numbers Accurately" - Guy L. Steele Jr. & Jon L. White https://lists.nongnu.org/archive/html/gcl-devel/2012-10/pdfkieTlklRzN.pdf [4] "Printing Floating-Point Numbers Quickly and Accurately with Integers" - Florian Loitsch https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf [5] Grisu3 implementation in C: https://github.com/juj/MathGeoLib/blob/master/src/Math/grisu3.cThread Next

**Prospective RFC-002 - Interpolate NVs to Decimal Strings Correctlyand Concisely**by sisyphus- Re: Prospective RFC-002 - Interpolate NVs to Decimal StringsCorrectly and Concisely by Paul "LeoNerd" Evans

nntp.perl.org: Perl Programming lists via nntp and http.

Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About