develooper Front page | perl.perl5.porters | Postings from December 2021

Re: Pre-RFC: `unknown` versus `undef`

Thread Previous | Thread Next
From:
Darren Duncan
Date:
December 19, 2021 00:45
Subject:
Re: Pre-RFC: `unknown` versus `undef`
Message ID:
b19e1180-a262-b74d-2cfb-657347a90168@darrenduncan.net
I want to try and clarify my position here.

The main thing is I feel that it is important in this discussion to treat some 
matters separately that I consider very distinct:

1.  Plain and focused 3-valued-logic which seems to be what Ovid is actually 
advocating for, we have exactly 1 special value Unknown, and we deal with how 
its appearance in every situation in place of any other value affects the behavior.

2.  The idea of N-valued logic, referred to by H.Merijn Brand, where you have an 
arbitrary number of special values where each adds a different dimension to the 
logic.

My position is that it is N-valued logic and its arbitrary count of logical 
dimensions that is the largest problem here.  In general its complexity is 
exponential, doubling with each new dimension, as each one needs to specifically 
define its interactions with the others.  For example, what does "unknown value" 
plus "unauthorized" return?

So I would hope that one thing we can all agree on is that the idea of N-valued 
logic is excluded from the Pre-RFC, and it sticks to pure 3-valued logic.

I can find acceptable plain 3-valued logic with clearly defined rules such that 
every possible scenario involving one is completely deterministic with fully 
defined behavior.

In fact I will pivot and semi-endorse Ovid's proposal for some kind of support 
of 3-valued-logic in principle, and that it is mainly down to the details to 
discuss.

I say semi-endorse because I recognize that when done well it can be very 
helpful as Ovid has illustrated, but at the same time I don't consider it a 
silver bullet and I feel that having it can introduce new problems when you 
consider all of its implications.  For one thing, automatic refactoring or 
optimizations for performance could be a lot more complicated because some 
assumptions on what are safe under 2VL may not be under 3VL.

The most important thing is that this new feature is fully backwards-compatible, 
meaning that all of the 3-valued logic behavior is strictly tied to instances of 
the new special unknown value, and that all Perl code will behave identically to 
before in a Perl supporting the special unknown value where the special unknown 
value isn't explicitly made present.  So there are NO changes at all to how Perl 
undef is treated in the absense of the special unknown value.

Since the presence of the special unknown value would fundamentally alter how a 
lot of existing operators/builtins would behave, we would need to add some more 
builtins to provide functionality that would then be missing.  For example, if 
we want to test whether something is or is not the special unknown value, we 
would need new builtins that return true or false on that question, rather than 
returning unknown.  It must be possible to reason about anything involving the 
special unknown value where the result is not that unknown value.

Given how the presence of "unknown" could make some workflows worse, or 
alternately that in some workflows it should be encouraged, I feel that Perl 
should provide mechanisms to flag when it is or isn't being used, similar to use 
strict or use warnings.  For example, have something that warns if unknown is 
used as the input to any operation, similarly to what we have for undef.

Regarding full implications, what do we expect to happen if "unknown" is 
interpolated into a string, is the whole strong become "unknown"?  What happens 
if one says "print unknown;"?  What if one asks, does this array or hash contain 
any unknown elements?

I also suggest that we consider some new term or keyword to refer to the special 
value rather than "unknown", something that is easy to search for and not get 
lost in a haystack of other uses of that word in English.  Ideally not one that 
matches one of many possible reasons we might not have a regular value. 
Although just using "unknown" isn't terrible.

==========

Now going off on a tangent, I feel that something we may want to look at is 
something analogous to Raku's Failure concept.

The way I see it, there is a generalized concept of "regular value is missing 
with a specific reason" such that N-valued logic might try to address, and the 
concept of an exception in a typical language that gives a reason for the problem.

So what I propose would be useful, and this can be implemented with either 
2-valued logic or 3-valued logic, is that any Perl code which would otherwise 
produce undef or unknown to represent that it is not giving a normal answer, it 
instead returns a value of a dedicated type that represents a declaration that 
we don't have a regular result and here is the specific reason why.

Providing that would be a similar complexity level to providing a formal 
exception class hierarchy, including that Perl has some built-in and users can 
and frequently would define their own.

I will refer to this concept for the moment with "Excuse".

So we have a distinct core data type called Excuse which is like the singleton 
Unknown concept in some ways but that Excuse is basically an object-like 
collection type whose contents specify the reason.

So rather than treating a singleton unknown as special, the 3-valued logic would 
treat any instance of Excuse as special, but each reason does NOT add a logic 
dimension, rather it is just further information "if you want to know".

Having this, we can both support logic like Ovid's demonstrations where we don't 
want to have to care about a reason a regular value is missing and just do the 
right thing when a regular value is missing, and we can support logic where we 
also do want to know WHY the regular value is missing, we can ask that question, 
and be able to distinguish say "nothing matched the search query" from "there 
was a match but you don't have permission to see it" and so on.

-- Darren Duncan

On 2021-12-18 3:09 a.m., Ovid via perl5-porters wrote:
> Yes, SQL NULL is broken in fundamental ways that CJ Date shows here: https://www.oreilly.com/library/view/sql-and-relational/9781449319724/ch04s04.html
> 
> And yes, I've been bitten by that bug in SQL in real-world code. Once. In over two decades. And I write lots of SQL. *Most* of the time, however, the 3VL NULL is what we need. Can you imagine if NULL followed "undef" behavior?
> 
>      SELECT count(*) FROM things WHERE value > ?;
> 
> That would be a disaster and it's easily replicable in Perl:
> 
>      my $total = grep { $_->value > $limit } @things;
> 
> I, for one, am tired of writing code like this:
> 
>      my $total = grep { defined $_->value ? $_->value > $limit : 0 } @things;
> 
> Note: the following is *not* equivalent to the above:
> 
>      my $total = grep { ( $_->value // 0 )  > $limit } @things;
> 
> I mean, it *looks* correct, but what if the value can be a negative number and the limit can be negative? You probably than want this:
> 
>      my $total = grep { ( $_->value // ( $limit - 1 ) )  > $limit } @things;
> 
> Which arguably might be more confusing than using defined. With 3VL, we have this:
> 
>      my $total = grep { $_->resolution < $limit } @things;
> 
> Worse, I'm tired of tracking down bugs caused by this.
> 
> 2VL logic on undef/null values been broken for a long time and forces developers to remember to always write special case code to handle this.
> 
> However, while we could correct the underlying issue, going further into 4VL or 5VL adds complications that I doubt most developers are going to understand. In other words, SIMPLICITY IS YOUR FRIEND.
> 
> We don't need "perfect" because making something that covers all possible cases is simply going to be a mess and might even be counter-productive. For example, if you're unauthorized to get a value but you see that it's a "known defined value", that's an information leak. Also, given Merijn's original list:
> 
> 1. Known defined value
> 2. Known undefined value
> 3. Unknown value
> 4. Unauthorized to get the value
> 5. Value is defined but unauthorized to get it
> 
> I don't see how 4+1 is different from 5. So we can bikeshed this to death, or fix the major underlying problem: $salary += 1000. Congrats. You've just given a raise to an unpaid volunteer.

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About