Front page | perl.perl5.porters |
Postings from October 2021
Re: PSC #040(?) - Perl SV Flags special
From: Nicholas Clark
October 15, 2021 08:03
Re: PSC #040(?) - Perl SV Flags special
Message ID: YWk1rrp2K/1QbXoX@etla.org
On Thu, Oct 14, 2021 at 11:18:42AM +0100, Paul "LeoNerd" Evans wrote:
> What is far less clear is what the public flags are meant to mean. We
After the call, I think I came to the conclusion that (pragmatically) the
public flags really are "implementation detail" than "public".
I don't know if it's even written down anywhere, but the basic "data
structure with cached values and flags" design paradigm seems to be
* have a data structure with space to cache conversions and flags
* have macros which inline small code which can quickly test the flags
for common cases, and where possible use cached values
* else call into a function for the slow path, which can hopefully
cache values and set flags to permit fast paths in the future
* arrange all the various SV data structures so that the cache slots
are at the same offsets from (stored) pointers, so that the reading
code in the macros doesn't need to branch
The second problem we're juggling is that that Perl doesn't have a type
system (old news!) but in this context it's that it doesn't have a
*numeric* type system.
$c = $a + $b;
is "obviously" numeric addition.
But what should be the result of?
18014398509481984 + 2e16;
1e16 + 2e16;
The point/problem being that all these values *can* be stored as 64 bit
integers, but effectively printing them out *as* 64 bit integers creates
floating point. But we're faking things internally to also offer 64 bit
integer maths on 64 bit platforms. And we don't have a type system...)
and related to this is that this is valid and doesn't warn:
$ perl -wle 'print " 18014398509481984" + "18014398509481984 "'
(consider scripts that process data coming in from text files, where those
files are a bit sloppy with their whitespace.)
We have all these useful behaviour currently. It's a juggling act to keep
them all working whilst also changing things/adding more "things we test for"
> (I'd love to say "just follow that link to be surprised" but you'd all
> think that it was a Rickroll, so I'll observe that it's "perl 5.0
> alpha 4")
This, on the other hand...
> forgotten if they were originally strings or numbers. Nick's PR
> (https://github.com/Perl/perl5/pull/18958) will certainly help this
> situation but that isn't sufficient.
I think that it's pretty close to sufficient for distinguishing "numeric"
vs "string". In that:
> The suggested next steps here involve creating a long list of "test
> cases"; situations involving performing various kinds of operations on
> values/variables, and specifying what are the properties of results,
> and side-effects on variables within it. Likely many of these
> properties will take the form of "appears to be a string" or "appears
> to be a number" or similar.
Yves presented some test cases that were new to me, which I can see cause
code paths to be taken deep in the conversion routines that don't set flags
consistently with how that PR assumes they should be. But for all that I
skimmed, it seemed fixable (within the limits of "string" vs "number")
Specifically, I think that the testing regime should be combinatorial of
1) I create a value (an integer literal, a floating point literal, or a string
containing something that is, or is *close* to either)
2) Maybe I copy it
3) I read it (or the copy) as (in an integer expression, in a floating point
expression, in a string context)
(internally a conversion might cached, and flags might change as a result)
4) Maybe I copy it again
5) Can the new API still report correctly what step (1) was?
6) If it's used in addition or other "maybe IV/maybe NV" arithmetic, does it
behave the same way as if used immediately after step 1?
(same choice of IV vs NV? Same warnings?)
I think that this is viable for "string" vs "numeric"
(vs undef vs boolean vs reference vs "you're own your own here because it's a
I'm not sure that we can push this to distinguishing between "started as an
integer literal" vs "started a floating point", *and* I'm not sure if we
need to. The big problem we're trying to solve here is correctly generating
formats such as JSON and YAML that *are* sensitive to strings vs numbers,
and I didn't think that they (or their other-language consumers) were
sensitive to "what sort of a number is it?"