Front page | perl.perl5.porters |
Postings from December 2021
Re: Pre-RFC: `unknown` versus `undef`
Thread Previous
|
Thread Next
From:
Darren Duncan
Date:
December 19, 2021 00:45
Subject:
Re: Pre-RFC: `unknown` versus `undef`
Message ID:
b19e1180-a262-b74d-2cfb-657347a90168@darrenduncan.net
I want to try and clarify my position here.
The main thing is I feel that it is important in this discussion to treat some
matters separately that I consider very distinct:
1. Plain and focused 3-valued-logic which seems to be what Ovid is actually
advocating for, we have exactly 1 special value Unknown, and we deal with how
its appearance in every situation in place of any other value affects the behavior.
2. The idea of N-valued logic, referred to by H.Merijn Brand, where you have an
arbitrary number of special values where each adds a different dimension to the
logic.
My position is that it is N-valued logic and its arbitrary count of logical
dimensions that is the largest problem here. In general its complexity is
exponential, doubling with each new dimension, as each one needs to specifically
define its interactions with the others. For example, what does "unknown value"
plus "unauthorized" return?
So I would hope that one thing we can all agree on is that the idea of N-valued
logic is excluded from the Pre-RFC, and it sticks to pure 3-valued logic.
I can find acceptable plain 3-valued logic with clearly defined rules such that
every possible scenario involving one is completely deterministic with fully
defined behavior.
In fact I will pivot and semi-endorse Ovid's proposal for some kind of support
of 3-valued-logic in principle, and that it is mainly down to the details to
discuss.
I say semi-endorse because I recognize that when done well it can be very
helpful as Ovid has illustrated, but at the same time I don't consider it a
silver bullet and I feel that having it can introduce new problems when you
consider all of its implications. For one thing, automatic refactoring or
optimizations for performance could be a lot more complicated because some
assumptions on what are safe under 2VL may not be under 3VL.
The most important thing is that this new feature is fully backwards-compatible,
meaning that all of the 3-valued logic behavior is strictly tied to instances of
the new special unknown value, and that all Perl code will behave identically to
before in a Perl supporting the special unknown value where the special unknown
value isn't explicitly made present. So there are NO changes at all to how Perl
undef is treated in the absense of the special unknown value.
Since the presence of the special unknown value would fundamentally alter how a
lot of existing operators/builtins would behave, we would need to add some more
builtins to provide functionality that would then be missing. For example, if
we want to test whether something is or is not the special unknown value, we
would need new builtins that return true or false on that question, rather than
returning unknown. It must be possible to reason about anything involving the
special unknown value where the result is not that unknown value.
Given how the presence of "unknown" could make some workflows worse, or
alternately that in some workflows it should be encouraged, I feel that Perl
should provide mechanisms to flag when it is or isn't being used, similar to use
strict or use warnings. For example, have something that warns if unknown is
used as the input to any operation, similarly to what we have for undef.
Regarding full implications, what do we expect to happen if "unknown" is
interpolated into a string, is the whole strong become "unknown"? What happens
if one says "print unknown;"? What if one asks, does this array or hash contain
any unknown elements?
I also suggest that we consider some new term or keyword to refer to the special
value rather than "unknown", something that is easy to search for and not get
lost in a haystack of other uses of that word in English. Ideally not one that
matches one of many possible reasons we might not have a regular value.
Although just using "unknown" isn't terrible.
==========
Now going off on a tangent, I feel that something we may want to look at is
something analogous to Raku's Failure concept.
The way I see it, there is a generalized concept of "regular value is missing
with a specific reason" such that N-valued logic might try to address, and the
concept of an exception in a typical language that gives a reason for the problem.
So what I propose would be useful, and this can be implemented with either
2-valued logic or 3-valued logic, is that any Perl code which would otherwise
produce undef or unknown to represent that it is not giving a normal answer, it
instead returns a value of a dedicated type that represents a declaration that
we don't have a regular result and here is the specific reason why.
Providing that would be a similar complexity level to providing a formal
exception class hierarchy, including that Perl has some built-in and users can
and frequently would define their own.
I will refer to this concept for the moment with "Excuse".
So we have a distinct core data type called Excuse which is like the singleton
Unknown concept in some ways but that Excuse is basically an object-like
collection type whose contents specify the reason.
So rather than treating a singleton unknown as special, the 3-valued logic would
treat any instance of Excuse as special, but each reason does NOT add a logic
dimension, rather it is just further information "if you want to know".
Having this, we can both support logic like Ovid's demonstrations where we don't
want to have to care about a reason a regular value is missing and just do the
right thing when a regular value is missing, and we can support logic where we
also do want to know WHY the regular value is missing, we can ask that question,
and be able to distinguish say "nothing matched the search query" from "there
was a match but you don't have permission to see it" and so on.
-- Darren Duncan
On 2021-12-18 3:09 a.m., Ovid via perl5-porters wrote:
> Yes, SQL NULL is broken in fundamental ways that CJ Date shows here: https://www.oreilly.com/library/view/sql-and-relational/9781449319724/ch04s04.html
>
> And yes, I've been bitten by that bug in SQL in real-world code. Once. In over two decades. And I write lots of SQL. *Most* of the time, however, the 3VL NULL is what we need. Can you imagine if NULL followed "undef" behavior?
>
> SELECT count(*) FROM things WHERE value > ?;
>
> That would be a disaster and it's easily replicable in Perl:
>
> my $total = grep { $_->value > $limit } @things;
>
> I, for one, am tired of writing code like this:
>
> my $total = grep { defined $_->value ? $_->value > $limit : 0 } @things;
>
> Note: the following is *not* equivalent to the above:
>
> my $total = grep { ( $_->value // 0 ) > $limit } @things;
>
> I mean, it *looks* correct, but what if the value can be a negative number and the limit can be negative? You probably than want this:
>
> my $total = grep { ( $_->value // ( $limit - 1 ) ) > $limit } @things;
>
> Which arguably might be more confusing than using defined. With 3VL, we have this:
>
> my $total = grep { $_->resolution < $limit } @things;
>
> Worse, I'm tired of tracking down bugs caused by this.
>
> 2VL logic on undef/null values been broken for a long time and forces developers to remember to always write special case code to handle this.
>
> However, while we could correct the underlying issue, going further into 4VL or 5VL adds complications that I doubt most developers are going to understand. In other words, SIMPLICITY IS YOUR FRIEND.
>
> We don't need "perfect" because making something that covers all possible cases is simply going to be a mess and might even be counter-productive. For example, if you're unauthorized to get a value but you see that it's a "known defined value", that's an information leak. Also, given Merijn's original list:
>
> 1. Known defined value
> 2. Known undefined value
> 3. Unknown value
> 4. Unauthorized to get the value
> 5. Value is defined but unauthorized to get it
>
> I don't see how 4+1 is different from 5. So we can bikeshed this to death, or fix the major underlying problem: $salary += 1000. Congrats. You've just given a raise to an unpaid volunteer.
Thread Previous
|
Thread Next