Front page | perl.perl5.porters |
Postings from December 2021
Re: Pre-RFC: `unknown` versus `undef`
Thread Previous
|
Thread Next
From:
Oodler 577 via perl5-porters
Date:
December 18, 2021 12:20
Subject:
Re: Pre-RFC: `unknown` versus `undef`
Message ID:
Yb3SEx6FoU0Lm8VP@odin.sdf-eu.org
Top posting a few points:
1. in DBI:: modules, NULL is equivalent to undef
2. in real world tables, "employee" usually has a "type" field (or external way to derive this attribute); volunteers (interns?), salaried, and hourlies are not differentiated by the value of the "salary" field
3. seems like the polymorphic OOP solution is to just have:
* a way to "get_employee_type" on the employee instance, and
* throw an exception if $emp->get_salary is called on $emp,
C<if ($emp->get_employee_type == VOLUNTOLD)>.
I can add that the nuances between undef and q{} have caused me confustion
in the past; but adding another special value to mean a type of "nothing"
could be problematic. For example, to point #1 above, how is a DBI call to
know what you mean when it already treats undef as NULL when a) replacing
place holders (used almost universally), or b) turn NULLs into undef when
returning results from a C<select_*> call?
With C<use warnings>,
TRUE:
(q{} == 0)
(undef == 0)
FALSE
(q{} eq 0)
(undef eq 0)
What makes matters somewhat more confusing is that C<int> returns 0 for
both, even with C<use warnings>:
int undef
int q{}
Though, this is implied in the TRUE/FALSE examples provided above. In
addition to this, it seems like this would mess with C<defined>; would
this add a possible answer to that? I'm just now able to remember that
C<defined undef> is rightly different than C<defined 0> or C<defined q{}>.
Anyway, I can't say I've ever wanted more out of a numerical field
other than its value. I'd never use undef or NULL to indicate anything
beyond that this field was not set with an actual value. And if I wanted
to know of $employee was a volunteer, I'd consult the "employee_type"
column.
Seems like this is best left to a module that overrides operators; but
the question for me remains - how would you represent this in a traditional
database other than storing either as a separate column or some sort
of composite value (e.g,; "0;volunteer") that this value is not to be
taken like the rest of the numbers?
Cheers,
Brett
* Ovid via perl5-porters <perl5-porters@perl.org> [2021-12-18 11:09:02 +0000]:
> Yes, SQL NULL is broken in fundamental ways that CJ Date shows here: https://www.oreilly.com/library/view/sql-and-relational/9781449319724/ch04s04.html
>
> And yes, I've been bitten by that bug in SQL in real-world code. Once. In over two decades. And I write lots of SQL. *Most* of the time, however, the 3VL NULL is what we need. Can you imagine if NULL followed "undef" behavior?
>
> SELECT count(*) FROM things WHERE value > ?;
>
> That would be a disaster and it's easily replicable in Perl:
>
> my $total = grep { $_->value > $limit } @things;
>
> I, for one, am tired of writing code like this:
>
> my $total = grep { defined $_->value ? $_->value > $limit : 0 } @things;
>
> Note: the following is *not* equivalent to the above:
>
> my $total = grep { ( $_->value // 0 ) > $limit } @things;
>
> I mean, it *looks* correct, but what if the value can be a negative number and the limit can be negative? You probably than want this:
>
> my $total = grep { ( $_->value // ( $limit - 1 ) ) > $limit } @things;
>
> Which arguably might be more confusing than using defined. With 3VL, we have this:
>
> my $total = grep { $_->resolution < $limit } @things;
>
> Worse, I'm tired of tracking down bugs caused by this.
>
> 2VL logic on undef/null values been broken for a long time and forces developers to remember to always write special case code to handle this.
>
> However, while we could correct the underlying issue, going further into 4VL or 5VL adds complications that I doubt most developers are going to understand. In other words, SIMPLICITY IS YOUR FRIEND.
>
> We don't need "perfect" because making something that covers all possible cases is simply going to be a mess and might even be counter-productive. For example, if you're unauthorized to get a value but you see that it's a "known defined value", that's an information leak. Also, given Merijn's original list:
>
> 1. Known defined value
> 2. Known undefined value
> 3. Unknown value
> 4. Unauthorized to get the value
> 5. Value is defined but unauthorized to get it
>
> I don't see how 4+1 is different from 5. So we can bikeshed this to death, or fix the major underlying problem: $salary += 1000. Congrats. You've just given a raise to an unpaid volunteer.
>
> Best,
> Ovid
> --
> IT consulting, training, specializing in Perl, databases, and agile development
> http://www.allaroundtheworld.fr/.
>
> Buy my book! - http://bit.ly/beginning_perl
>
>
>
>
>
>
> On Saturday, 18 December 2021, 11:20:06 CET, Darren Duncan <darren@darrenduncan.net> wrote:
>
>
>
>
>
> For the record, which I partially discussed on a related Twitter thread a few
> days ago, I feel that using anything other than 2VL in a fundamental capacity is
> a serious mistake, and if anyone is considering 3VL, 4VL, etc then they probably
> have a design flaw that should be corrected in some other way.
>
> All the regular types and operators should operate with pure 2VL, including all
> the regular equality or comparison or sorting operators. The behavior of
> regular operators should not be overloaded or overridden, lexically or
> otherwise, so that they behave in a 3+VL manner. This would be a huge source of
> bugs where people look at code expecting certain behavior and getting something
> else.
>
> How something like a special Unknown value should work is that it provides a set
> of operators/subs with DIFFERENT NAMES that provide the 3VL etc logic, and so
> for example one writes:
>
> eq_3vl($x,$y)
> lt_3vl($x,$y)
> grep_3vl(...)
>
> Or for simple 3VL you don't even need the special Unknown value, instead these
> operators can treat the standard undef that way. The Unknown value is more
> useful if you want to override the behavior of existing operators.
>
> As for 4VL, 5VL, etc, once you even start thinking about that, there's an even
> stronger case that what you really should be using is 2VL with a bunch of
> singleton types, where each singleton represents a specific reason a normal
> value is missing, such as:
>
> Unknown
> Not Applicable
> Permission Denied
> Record Not Found
> etc
>
> The idea of changing the behavior of undef even lexically with a feature is
> problematic. What if someone sees code in such a file and copies it into
> another file, or in reverse, where the other doesn't have that feature declared,
> then code which is exactly the same has changed behavior.
>
> As a compromise, I would find either of these 2 things acceptable:
>
> 1. Have an Unknown::Values or whatever singleton class which overrides built-in
> operators/subs but its effects are tightly bound to instances of that class.
>
> 2. Declare new operators/subs with new names that provide 3VL with standard undefs.
>
> Those provide this 3VL opt-in and explicitly if users want it, and its
> relatively easy for users reading the code later to know its using 3VL.
>
> But behavior of built-in operators or undefs should never change as the result
> of a feature pragma or such.
>
> Also, SQL NULLs are not actually 3VL, they are much more complicated than that,
> and we don't want to try and imitate SQL if we want to provide 3VL.
>
> -- Darren Duncan
>
> On 2021-12-18 1:43 a.m., H.Merijn Brand wrote:
> > On Sat, 18 Dec 2021 08:57:05 +0000 (UTC), Ovid via perl5-porters <perl5-porters@perl.org> wrote:
> >
> >> Hi there,
> >>
> >> As most of you know, "undef" values often cause all sorts of interesting bugs in Perl. I wrote https://metacpan.org/pod/Unknown::Values to address this. Instead of the 2VL that undef uses, it uses Kleene's traditional 3VL (three-value logic) akin to SQL's NULL.
> >
> > 1. Known defined value
> > 2. Known undefined value
> > 3. Unknown value
> > 4. Unauthorized to get the value
> > 5. Value is defined but unauthorized to get it
> >
> > When doing 3VL, number 4 is essential
> >
> >> Basic usage looks like this:
> >>
> >> use Unknown::Values;
> >>
> >> my $value = unknown;
> >> my @array = ( 1, 2, 3, $value, 4, 5 );
> >> my @less = grep { $_ < 4 } @array; # (1,2,3)
> >> my @greater = grep { $_ > 3 } @array; # (4,5)
> >>
> >> my @underpaid;
> >> foreach my $employee (@employees) {
> >>
> >> # this will never return true if salary is "unknown"
> >> if ($employee->salary < $threshol ) {
> >> push @underpaid => $employee;
> >> }
> >> }
> >>
> >> I've also written about this here: http://blogs.perl.org/users/ovid/2013/02/three-value-logic-in-perl.html
> >>
> >> I've always thought this belongs directly in a programming language, but never suggested this because I assumed there would be no interest
> >>
> >> To my surprise, brian d foy suggested it be in the core (https://twitter.com/briandfoy_perl/status/1471684211602042880)
> >>
> >> He wrote: "Unknown::Value from @OvidPerl looks very interesting. These objects can't compare, do math, or most of the other default behavior that undef allows. This would be awesome in core."
> >>
> >> Would there be interest?
> >
> > Yes, when 4VL (or 5VL)
> >
> > Thinking about it, there might be a hook to add 5, 6, 7, 8, 9 etc :)
> >
> >> Ovid
> >
>
>
--
--
oodler@cpan.org
oodler577@sdf-eu.org
SDF-EU Public Access UNIX System - http://sdfeu.org
irc.perl.org #openmp #pdl #native
Thread Previous
|
Thread Next