develooper Front page | perl.perl5.porters | Postings from March 2022

Re: Pre-RFC: builtin:: functions for detecting numbers vs strings

Thread Previous | Thread Next
From:
demerphq
Date:
March 9, 2022 14:30
Subject:
Re: Pre-RFC: builtin:: functions for detecting numbers vs strings
Message ID:
CANgJU+V2U0dZ_moJQW4YbDqA3zaEshojMWpnUmgfO453Ydq0dw@mail.gmail.com
On Wed, 9 Mar 2022 at 12:40, Salvador Fandiño <sfandino@gmail.com> wrote:
> On 9/3/22 12:21, demerphq wrote:
> > On Wed, 9 Mar 2022 at 10:10, Salvador Fandiño <sfandino@gmail.com> wrote:
> >> On 8/3/22 17:25, Karl Williamson wrote:
> >>> On 3/8/22 07:38, Graham Knop wrote:
> >>>> On Fri, Mar 4, 2022 at 4:29 PM Paul "LeoNerd" Evans
> >>>> <leonerd@leonerd.org.uk> wrote:
> >>>>> After further discussion with PSC, we'd like to keep moving this
> >>>>> forward. There's still time to add new functions to builtin:: in time
> >>>>> for 5.36, and it would be nice to get these in.
> >>>>>
> >>>>> We agree that they should not be named "isnumber" and "isstring",
> >>>>> mostly because of your concerns about leading people to think they do
> >>>>> something that they don't.
> >>>>>
> >>>>> I'd like to suggest you write up an RFC on this request, perhaps
> >>>>> beginning with the names
> >>>>>
> >>>>>     builtin::was_originally_number
> >>>>>     builtin::was_originally_string
> >>>>>
> >>>>> They're sufficiently long and unwieldy as to mildly discourage people
> >>>>> from using them except when absolutely necessary (read: on JSON
> >>>>> serialisers and similar), and the name itself doesn't suggest it tells
> >>>>> you current information about the actual type of a value, merely tells
> >>>>> you the history on how it started.
> >>>> I've created an RFC PR: https://github.com/Perl/RFCs/pull/13
> >>>>
> >>>> In the RFC, I'm using the names builtin::created_as_number and
> >>>> builtin::created_as_string. Justification is included in the RFC, but
> >>>> I think these better match what we want their return values to
> >>>> represent. And I personally, was_originally_number feels really
> >>>> awkward as a function name. The names could still be changed of
> >>>> course; I'm not overly attached to what I've chosen.
> >>> I think "created_as..." are better than previous suggestions
> >> It has just occurred to me that we may be missing the point focusing in
> >> that it-was-created-as-a-whatever thing too much.
> > No, that is very much the point. Serializers, especially JSON need to
> > know this or they simply do the wrong thing.
> >
> >> If a scalar was created as, say, a number, it keeps being a number. That
> >> perl is able to transparently convert that value into something else
> >> when needed and cache the result, doesn't change the fact that it is
> >> still a number.
> > Right. But previously we couldn't tell if the number started off as a
> > string or a number.
>
>
> IMO, the problem here is that we are still keeping the old mindset where
> perls scalars where transformed from one type to the other and we
> couldn't tell which one was the former one.
>
> The thing is that now we know the former type, so, I think we should
> stop thinking about the-scalar-that-was-created-as-a-number and instead
> start considering it the-scalar-that-is-a-number.
>
>
> The fact that Perl can internally keep other representations of the
> scalar (for instance, a string) doesn't change the fact that the scalar
> *is* a number.

I think you are missing the point though. "created as a number" and
"is a number" aren't the same thing.

Perl defines that "7" and 7 are both equivalent in almost every
regard, the few places they might not be the dev is expected to
disambiguate, eg logical operators like ^ | & care, nothing else in
perl does, and in those cases the dev should be specifying which
behavior they want by doing 0+$x | 0+$y or "$x" | "$y". Because to
perl both "are" the number seven and both "are" the string "7".

But "7" is created as a string, and 7 is created as a number.

We dont want people to think that somehow 7 is less of a number than
"7" because to Perl they should be the same thing.

However when we interoperate with other systems /they/ make a
distinction between 7 and "7". And using the wrong one can confuse
them. So we want a way where serialization layers can know that a
given $var should be represented as an unquoted number or as a string
when talking to those external systems. The reason the name is chosen
to be the cumbersome "created_as_number" is that we don't /want/ code
to be written that checks these things very often. These functions
should be extremely rare.  That is also why we want to have a
looks_like_number() as well, because that side-steps the "is a"
question and changes it to a "can be used as a" question.  Its almost
at the level that if you use one of these functions and you are not
writing a serialization module then you are basically doing something
wrong.

Maybe it would even be better to call these functions
"serialize_as_number" instead of "created_as_number" because that
would make clear the specialist use case.

Almost every other case should either be saying  "looks_like_number"
or it should simply be forcing the variable to be a number with a 0+
or forcing the variable to be a string with "$x".

> >>   From a functional point of view, I think that was is needed is a set of
> >> functions to check the type of a scalar and another set of functions (a
> >> la looks_like_number) to check whether it can be converted into
> >> something else:
> >>
> >>     builtin::isa_number
> >>     builtin::looks_like_number
> >>     etc.
> > I dont really follow what "isa_number" is compared to
> > "looks_like_number". Is "isa_number" meant to to be the same as
> > "created_as_number" from this proposal?
>
> Yes!

Ok.

>
> >
> >> One important point here is that neither "isa_number", neither
> >> "looks_like_number" are influenced by the private type flags (or the
> >> scalar history, which is an uninteresting thing, right?):
> >>
> >>     $a = "7";
> >>     say isa_number($a), looks_like_number($a);
> >>     $b = $a+1;
> >>     say isa_number($a), looks_like_number($a); # same results
> > I dont get what you want here. The standard definition of
> > looks_like_number() would return the same thing for both lines. I am
> > not sure what isa_number is supposed to do, but if its the same as
> > created_as_number() then it would too.  Eg,  say would output FALSE,
> > TRUE both times. (For some printed definition of FALSE and TRUE).
>
> Yes, and that is the point.
>
>  From a functional point of view you are interested in two things:
>
> 1) is this scalar a number? (equivalent to was this scalar created as a
> number?!)

No. From a functional point almost no perl code should want to ask
this as there is by design no difference.  Basically *just*
serialization modules care.

From a functional point of view the only question of interest should
be "looks_like_number".

>
> 2) can this scalar by used as a number?

That would be "looks_like_number".

> And something that is completely uninteresting is:
>
> 3) has this scalar been used as a number?

No perl code should ever be asking this. XS code for optimization
reasons maybe, and even then it should be hidden behind XS macros or
functions.

cheers,
Yves

-- 
perl -Mre=debug -e "/just|another|perl|hacker/"

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About