develooper Front page | perl.perl5.porters | Postings from September 2011

Re: A summary of Chip patches and potential impact

Thread Previous | Thread Next
September 5, 2011 03:56
Re: A summary of Chip patches and potential impact
Message ID:
Reverend Chip wrote:
>2. The second biggest change is the change to cached string conversions
>so they're POKp but not POK, which enables us to distinguish strings
>from numbers.
>3. Then there's the propagation of the boolean-ness of copies of sv_true
>and sv_false.  I can't imagine that would break *anything*;

These changes would be pretty much invisible if applied only to
already-existing Perl code.  That's the basis on which you're saying
they can't break anything.  But they'd also be pointless if they were
only applied to already-existing code.  The point of them is:

>4. The least problematic change, structurally, is the making visible of
>the previous patches in C as SvIsBOOLEAN, SvIsNUMBER, and SvIsSTRING. 

The intention is that new code will use this type information
semantically.  Then you run into trouble when such new code interacts
with existing code that knows that there is no such type distinction.

Presumably you intend that (for example) string concatenation will
continue to operate on any scalar, coercing booleans[0] and numbers along
with undef and references.  (It would be insane to break that, because
it would break approximately every Perl program ever written.)  So this
isn't a *strong* type system that you're adding, it's an additional type
hint on an object that is otherwise of the existing polymorphic kind.

One way of looking at this is that you're trying to attach new intensional
type information beyond the extensional type.  This sort of thing has been
discussed repeatedly before, in the context of wanting to distinguish
octet strings from character strings.  The inevitable conclusion is
that this doesn't work.  An object can change from one intensional type
to another without undergoing any language-visible operation.  (E.g.,
transforming a string of digits into the number that it represents
in decimal.)  Any attempt to maintain new intensional type information
will give incorrect results on at least some existing code.  In all the
cases mentioned here, quite a lot of existing code.

Another, more specific, concern: there's lots of existing code whose
job is to store a value and return it later.  Currently, if it finds
it has a plain scalar then it can store its string and numeric parts,
in any form, and later accurately reconstitute the scalar from them.
Given your new type distinctions, however, especially the boolean one,
the existing code will not accurately reconstitute the new type flags,
and so will not be accurately storing the scalar for the purposes of
code in which the new type information is semantically significant.
This is the converse of the problem discussed in the previous paragraph:
the new type flags will fail to characterise existing code, and existing
code will fail to preserve the new type flags.

>                   Scalars already have several fundamentally distinct types
>of content (undef, reference, glob, string+number);

References have been visibly distinguished from strings/numbers as long
as they've existed (since 5.000).  (Globs and undef have beed distinct
at least as long; references are the clearest case.)  They were a new
extensional type that didn't exist before.  This is completely different
from what you're proposing, where you want to change existing objects.
It would be an accurate analogy if you were introducing new number and
boolean types with a status similar to the recent first-class regexps.
I think integrating such a new type would be more difficult than dealing
with first-class regexps, because we can get away with almost always
dealing with the regexps indirectly.

>And it seems like an obvious step toward running Perl
>on systems where strings and numbers are very different.

I don't see what you mean here.  Perl *is* the system here, and determines
for itself how distinct strings and numbers are.


[0] My inner pedant demands to protest this usage of the word
"boolean"[1].  Boolean logic, as codified by George Boole, is *not* merely
two-value truth logic.  It refers to any algebra with operators that
are related to each other in the same way as AND and OR.  In particular,
set algebra is Boolean algebra, with the intersection and union operators
taking the role of AND and OR.  It follows that the number of distinct
values available in any Boolean algebra is a power of two, and the
whole algebra can always be modelled as bit vectors undergoing bitwise
AND and OR operations.  For this reason, when two-value logic is meant,
I prefer to say "truth value" rather than "Boolean value"[1].

[1] I'm also in two minds as to whether the word ought to be capitalised,
since it's derived from a proper name.

Thread Previous | Thread Next Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at | Group listing | About