develooper Front page | perl.perl5.porters | Postings from June 2021

Re: Prospective RFC-002 - Interpolate NVs to Decimal StringsCorrectly and Concisely

Thread Previous
From:
Nicholas Clark
Date:
June 22, 2021 08:53
Subject:
Re: Prospective RFC-002 - Interpolate NVs to Decimal StringsCorrectly and Concisely
Message ID:
20210622085302.GA9170@etla.org
Note, currently my incoming e-mail seems to be wedged, so whilst I can read
the public list archives, it's a pain to "reply" to them, and anything sent
privately is "held in a queue".

But what's special anyway about this - as a sysadmin friend likes to note,
the R in SMTP stands for "Real-time"...

On Mon, Jun 21, 2021 at 03:09:46PM +1000, sisyphus wrote:
> Hi Paul, Felipe,
> Thanks for the feedback - it is much appreciated.
> 
> I made that post as a result of this post from Nicholas Clark (on June 10):
> https://www.nntp.perl.org/group/perl.perl5.porters/2021/06/msg260408.html
> 
> I think we're all agreed that I'm not making a feature request.
> 
> Because this "fix" has such a noticeable effect on perl5's behaviour, I was
> expecting that it might not be addressed until perl7.
> Though it's fine by me if it happens earlier.
> I just want to do what I can to encourage the fixing of it.
> It's something that should be fixed, and something that should, in the
> interim, be on the TODO list.

Yes, we specifically requested it as an RFC. Currently the RFC template says:

## Backwards Compatibility

If proposing syntax changes, think in terms of "can this be detected by"/"misunderstood by":

* Static tooling inspecting source code
* The Perl interpreter at compile time
* Only as a runtime error
* Subtle runtime behaviour changes that can't be warned about and break things


and the process document says

## What needs an RFC? What can just be a PR?

There's no obvious answer, because there's no clear cut off, and there never will be, even when the process is "out of beta". For now we think we should use RFCs for

1. Language changes (feature changes to the parser, tokeniser)
2. Command line options
3. Adding/removing warnings (entries in `perldiag.pod`)
4. Significant changes to when an existing warning triggers


but Rob asked, and the PSC are wondering whether that list needs to have

5. Bug fixes that have the potential for significant runtime behaviour changes that can't be warned about and break things

(note, not all bug fixes, refactorings or optimisations. But it ought to be a
question that one asks - "is fixing this bug going to expose lots of other
people's bugs?")

so the "stars aligned" and as Rob was happy to try writing an RFC, we thought
we'd find out what one would look like for "not-a-language-feature".



On Sun, Jun 20, 2021 at 10:03:03PM +1000, sisyphus wrote:
> Hi,
> Attached is my RFC-0002, as it currently stands.
> It follows the layout recommended at
> https://github.com/Perl/RFCs/blob/master/docs/template.md , though I've
> also added a "References" section at the end which can easily be removed if
> preferred.

Thanks. This is a really good start, and I think with another iteration or
two it will be good to be (at least) "Provisional". I think "Accepted" would
need a firmer plan about whether to bundle source code, and if so whether
inline or as a file linked to, and figuring that out probably won't happen
until someone starts to implement it...

> I'm not entirely sure how well it aligns with the envisaged requirements of
> such a document but, having spent some time on putting it together, I've
> now reached the point where I can't see myself making any meaningful
> alteration to it without first receiving some feedback.

Big picture comment:

As far as Python is concerned, "There's More Than One Way To Do It":
https://www.python.org/dev/peps/pep-0001/
(plain text or reStructuredText)

but we were keen for "one, preferably only one obvious way to do it",
where it's a structured text format that can be parsed automatically, so
we really did want Markdown.

A future PSC might decide that actually Pod is more Pythonic - if so we'd
convert all the documents formats (probably mechanically with human fixup)
much like TCL did when they changed TIPs. (and Python never has. Go figure)


If you'd rather write Pod, it seems that there are CPAN tools to convert
that to Markdown - I'm fine with doing that as part of a check in, but I
don't think that I can convert what you have to either Pod or Markdown.

However, as the rest of the feedback is going to involve some edits, I think
that it's easier to do them first, and either "reformat" on the way, or
worry about it next.

(If you really don't want to do the formatting, I will, but there might be
a delay. Keep editing this in the format you prefer.)

> =========
> Rationale
> =========

> Candidates that allow for the behaviour being sought include Ryu [1][2],
> Dragon4 [3] and Grisu3 [4][5].
> Are there other candidates that should be considered ?

Structurally I don't think that questions belong in the rational. The
rational is "I propose we do it this way..." so that the document tries to
read in a straight line.

I think that the questions other libraries we don't yet know about belongs in
the "Open Issues" section.

Also, I think my answers to these are

1) Go with Ryu in the rational (fastest, and portable C)
2) Reject Grisu3 as one can't use it without also including Dragon4
3) Reject Dragon4 as Ryu is faster
4) Other than Dragonbox, I'm not aware of any other future contender that is
   still in the race, and
   1) Dragonbox isn't proven yet
   2) no C implementation exists
   3) Ryu has a 128 bit implementation - I don't know about the others

this agrees with the paragraphs that I cut here. I think that your
reasoning to reject Grisu3 and Dragon4 belongs in "Rejected Ideas"

I feel that moving it there makes the document flow better.

> According to the README.md from the Ryu github repo[2], Ryu accommodates all of
> perl5's commonly supported NV types.
> However, I doubt that it will adapt readily to the very uncommon double-double
> NV type - for which I think a dragon-type implementation might be the only
> option.
> Of course, the double-double is so rarely encountered that providing a fix for
> that type of NV can, I suggest, be deemed low priority.

I think that "double double" and anything that isn't the 3 usual IEEE types
(with rounding to even?) is "Future Scope"

Don't let the perfect be the enemy of the good. This plan is worth it even
if initially we can only do 64 bit doubles.

> =============
> Specification
> =============
> 
> NVs are interpolated into decimal strings such that:
> 1) the precise value of the NV can be deduced from the decimal string;
> 2) this decimal string comprise of no more significant digits than are needed
>    to make that first condition hold;
> 3) if there is more than one such string to choose from, then the one that is
>    nearest to $nv (ties to even) is the one that is used.

and maybe to be clear

4) "And no other changes" - we are only changing the decimal representation
   of NVs - not when we convert, not the SV types, not the SV flags.

pants, and writing that, I realise that we need to handle:

$ perl -wle 'use POSIX; use locale; setlocale(LC_NUMERIC, "de_AT.utf8"); print sqrt 2'
1,4142135623731
$ perl -wle 'use POSIX; use locale; setlocale(LC_NUMERIC, "en_GB.utf8"); print sqrt 2'
1.4142135623731

and for maximum fun:

./perl -Ilib -wle 'use utf8; use Devel::Peek; use POSIX; use locale; setlocale(LC_NUMERIC, "ps_AF.utf8"); $a = sqrt 2; Dump("$a")'
SV = PV(0x1bcad70) at 0x1d05930
  REFCNT = 1
  FLAGS = (PADTMP,POK,pPOK,UTF8)
  PV = 0x1bcf500 "1\331\2534142135623731"\0 [UTF8 "1\x{66b}4142135623731"]
  CUR = 16
  LEN = 18

At one point fa_IR locales had this, but now it seems only to be ps_AF.

> =======================
> Backwards Compatibility
> =======================

> For example, with List::Util on a perl whose $Config{nvsize} == 8, we would
> currently see:
> 
> $ perl -MList::Util -E 'say List::Util::uniqstr("1.4142135623731",sqrt 2);'
> 1

This was really the part of the point of "do we need an RFC" - how dangerous
are the changes.

(The rest - "how do we actually structure the code" and "who", are sort of
incidental, but it's nice to think about them in a formal document)


> =====================
> Security Implications
> =====================
> 
> ??

Yes, I had thought exactly this too.

But I think there is one

* Currently we use the system C library for formatting conversions. Hence any
  security issues are handled for us by the OS (ie Somebody Else's Problem)
* If we bundle third-party code, we need to track it for CVEs.

Likely these are rare, particularly on *output* formatting. But there has
been at least one for decimal parsing (which is a harder problem given the
unbounded input. Output only has 2**64 possibilities, for some value of 64)

CVE-2010-4476 was for Java for 2.2250738585072012e-308 - I thought that PHP
was also subject to the same bug, but I can't find a reference.

> ========
> Examples
> ========
> 
> These examples are as run on perl-5.34.0, configured with
> $Config{nvtype} of 'double'.
> The same types of issues arise with the other $Config{nvtype}
> values, too - though the details will differ.

I think that it would be sufficient to show this as a table. Probably
just

floating point expression | current stringification | correct stringification

although I realise "reason why current is wrong" is useful - in the text I
cut you note that sometimes we generate an accurate decimal representation
(ie it round trips back to the exact binary floating point) but it's not
correct because it's not the closest, or because it's not the shortest.

> ============
> Future Scope
> ============
> 
> Cover all NV types, assuming that this is not achieved to begin with.

Yes, agree.

> ==============
> Rejected Ideas
> ==============
> 
> See the "Rationale" section above.
> At this stage I'm rejecting only sprintf().
> I see Grisu3 as unlikely to be the best candidate because of its
> deficiency in coverage.
> I think Ryu will prove to be the best candidate - but let's see what
> others think.

As suggested above, I think invert this - move the rejected and maybe
rejected ideas here.

> 
> ===========
> Open Issues
> ===========
> 
> Issue 1:
> -------
> I should point out that I doubt my ability to implant this proposed change
> in perl's behaviour into the the perl CORE.
> I guess this means that, if this RFC proposal is accepted, one fairly
> obvious "Open Issue" is:
> Who is going to implement it ?

I don't think that that's viewed as an open issue. The intent is that the
RFC process is capable of having RFCs be in status "Provisional (Deferred)"
or "Accepted (Deferred)" if there is consensus that something is worth
doing, but it's not yet clear who might do it.

> Issue 2:
> -------
> Do we need to consider the possibility that a perl5 build might
> use a rounding mode other than "round to nearest, ties to even" ?
> Ryu claims to be able to handle all of the usual rounding modes, anyway.
> The dragon types can also handle the other rounding modes.

I don't think so. At worst it's "future scope".

At best we take a policy decision that "we want a consistent cross platform
behaviour for Perl, and that's 'round to nearest, ties to even"'. If your
particular code needs something else, please code it yourself.

> There's also the issue of how to format our interpolated decimal strings.
> AIUI, Ryu, Grisu and Dragon all create their results as an integer string
> and exponent pair - from which we can create our chosen formatting, be it
> 1501, or 1501.0,or 1.501e3, for example.
> I suppose we can just follow perl5's existing formatting rules ... or
> change them, if we so desire.

I think keep Perl's current formatting.

And do it as post-processing - try to change the upstream code as little as
possible.

> Issue 4:
> -------
> The double-double nvtypes can accommodate some (though not all) values
> up to a precision of 2098 bits.
> I'd therefore be surprised if Ryu is going to handle them readily.
> Dragon4 could handle this type of NV (as the Math::MPFR nvtoa function
> already does) - albeit at the cost of some beefy arbitrary precision
> integer calculations.

double double used this way seems to be nuts. I think that if anyone really
needs this (ie not just 106 bits of mantissa, or whatever you get in the
"normal" form where the two mantissas are adjacent) then they can use
Math::MPFR or something specialised.

I (and I think the PSC) feel that Perl is more about good general purpose
defaults than trying to cover every possible permutation (and not doing it
well enough to be generally useful)

> =========
> Copyright
> =========
> 
> ??

This should be you, as you wrote it.

> ==========
> References
> ==========

Using Markdown (or Pod) gets rid of the need to have a reference section, as
links are inline.


Nicholas Clark

Thread Previous


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About