develooper Front page | perl.perl5.porters | Postings from February 2022

Re: Pre-RFC: builtin:: functions for detecting numbers vs strings

Thread Previous | Thread Next
Paul "LeoNerd" Evans
February 24, 2022 14:04
Re: Pre-RFC: builtin:: functions for detecting numbers vs strings
Message ID:
On Wed, 23 Feb 2022 23:03:14 +0100
Graham Knop <> wrote:

> After the merge of, it is
> possible to reliably distinguish between values that started as
> numbers vs values that started as strings. Essentially, if the POK
> flag is set, the value started as a string. If the POK flag is not set
> and IOK or NOK is set, it started as a number. These flags can be
> checked in XS, or via the B module, but cannot currently be checked in
> pure perl. For serializers, being able to detect the original type of
> a value is essential, so it would help to provide functions to provide
> this information. For example, providing builtin::isnumber and
> builtin::isstring, to go along with builtin::isbool.

I definitely think that builtin:: should be providing some sort of
inspection API, but I'm not sure this is quite the right name or shape.
See below...

> This does potentially have a huge impact on the language as a whole.
> While this is needed for serializers, providing easy access to this
> type data will pretty much inevitably result in people using the
> functions within perl for things like parameter validation. The values
> "2" and 2 are meant to be fully interchangeable inside perl, and new
> code distinguishing them will break many expectations.

This too is verymuch my worry here...

> Even with caveats, I do think this is something that the language
> should provide. I am uncertain on the naming of the functions.
> builtin::isnumber and builtin::isstring are perhaps the most
> obvious, but they also imply that distinguishing these types is a
> normal thing to do. We may want the function names to be more
> opinionated, implying more strongly that they are meant to be used for
> things like serialization, not internal type checking. That feels like
> a losing battle though once any function of this type is available.

The naming of these things will be the critical point.

Currently, the predicate test-like functions in `builtin` are named
"isTHING" with no underscore, because they're just copied from
Scalar::Util. I think we may want to rethink that and include an
underscore; `builtin::is_bool`, `builtin::is_weak` and so on...

Secondly though, it's a shame that we're using English here. In
English, we use the same verb ("is") both to compare two nouns ("five
is a number") and to attach an adjective to a noun ("five is numeric").
For example, Spanish has two different verbs for these two cases. It's
verymuch the latter case we want.

The trouble is we're really dealing with multiple different kinds of
properties. We should keep this in mind.

  * There are the *categorical* properties - facts that classify a
    scalar at the moment of its creation into exactly one
    classification, and remain permanently unchanging throughout its
    lifetime. Observing the fact that as of 5.35.x, boolean values are
    a real true distinct thing, I think there are 4 substantial
    categories of scalar:

       defined but nonreferential

    Any scalar value is and always remains immutably within *one of*
    these four categories, regardless of any operation that is
    performed on it.

    Of these four categories, we already have tests to distinguish them

       undef              === !defined $x
       boolean            === builtin::isbool $x
       defined-but-nonref === defined $x and !defined ref $x
       reference          === defined ref $x

  * There are the *transient* properties - facts that might apply to
    some value at some point in time, or not at others. So far only one
    of these currently comes to mind:

       weakened reference === builtin::isweak $x

  * There are the *capability* properties - operations that might be
    performed on a value:

       truthiness  -- all scalar values can be evaluated for truth

       stringiness -- currently, all scalars can queried for numerical
                      or stringy value, though I imagine a day someday
                      when this can be disabled for references
                      these could still be true for objects that
                      overload the appropriate conversions

       objectiness -- this is  builtin::blessed

  * Finally, there are the trickiest properties to classify. I don't
    even have a name for these. These are the ones you're referring to

       originally_stringy -- has SvPOK()
       originally_numbery -- has (SvIOK()|SvNOK()) without SvPOK()

I wonder if we can find some sort of "has"-like wording, that fits
these? While I don't think these are good names as such, I think if the
names had been something like


Then it would be less surprising that something like the $! dualvar has
both numberness and stringiness, as it would be to find out that it
*is* both a number and a string.

It would also then be less surprising if we were to start splitting
down into sub-categories like has_integrality vs. has_floatiness. It
would potentially also be less surprising to find out that some object
reference has stringiness, if its object class overloads the '""'
operator for example.

If it was limited to just some predicate test functions living in
`builtin` this might not be too hard a problem, but there's a whole new
layer lurking below, in the form of type assertions on
lexicals, subroutine signature arguments, object fields; as well as
other syntax like match/case, `multi sub`, typed catch, ... This is a
far-reaching question with lots more complication to it than initially
meets the eye...

((I'll probably follow up that thought in a separate email))

Paul "LeoNerd" Evans      |  |

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About