develooper Front page | perl.perl6.language | Postings from August 2006

=== and array-refs

Thread Previous | Thread Next
From:
David Green
Date:
August 15, 2006 13:51
Subject:
=== and array-refs
Message ID:
a06230931c107c6005426@[172.27.1.7]
On 8/14/06, Smylers wrote:
>David Green writes:
>>  I guess my problem is that [1,2] *feels* like it should === [1,2].
>>  You can explain that there's this mutable object stuff going on, and I
>>  can follow that (sort of...), but it seems like an implementation
>>  detail leaking out.
>
>The currently defined behaviour seems intuitive to me, from a 
>starting point of Perl 5.

But is Perl 5 the best place to start?  It's something many of us are 
used to, but that doesn't mean it's the best solution conceptually, 
even if it was the most reasonable way to implement it in P5.

The reason I think it's an implementation wart is that an array -- 
thought of as a single, self-contained lump -- is different from a 
reference or pointer to some other variable.  Old versions of Perl 
always eagerly exploded arrays, so there was no way to refer to an 
array as a whole; put two arrays together and P5 (or P4, etc.) thinks 
it's just one big array or list.
Then when references were introduced, "array-refs" provided a way to 
encapsulate arrays so we could work with them as single lumps.  It's 
not the most elegant solution, but being able to nest data structures 
at all was a tremendous benefit, and it was backwards-compatible.

P6 doesn't have to be that backwards-compatible -- it already isn't. 
P6 more naturally treats arrays as lumps; this may or may not be 
*implemented* using references as in P5, but it doesn't have to -- or 
at least, it doesn't have to *look* as though that's how it's doing 
it.  Conceptually, an array consisting only of constant literals, 
like (1,2,3), isn't referring to anything, so it doesn't need to 
behave that way.

>The difference between:
>   my $new = \@orig;
>and:
>   my $new = [@orig];
>
>is that the second one is a copy; square brackets always create a 
>new anonymous array rather than merely refering to an existing one, 
>and that's the same thing that's happening here.  Think of square 
>brackets as meaning something like Array->new and each one is 
>obviously distinct.

I agree that \@orig should be distinct from [@orig] -- in the former 
case, we're deliberately taking a reference to the @orig variable. 
What I don't like is that [@orig] is distinct from [@orig] -- sure, 
I'm doing something similar to Array->new(1,2) followed by another 
Array->new(1,2), but I still want them to be the same, just as I want 
Str->new("foo") to be the same as Str->new("foo").  They're just 
constants, they should compare equally regardless of how I created 
them.  (And arrays should work a lot like strings, because at some 
conceptual level, a string is an array [of characters].)

>  > And I feel this way because [1,2] looks like it should be platonically
>>  unique.
>
>I'd say that C< (1, 2) > looks like that.  But C< [1, 2] > looks 
>like it's its own thing that won't be equal to another one.

Except [1,2] can look like (1,2) in P6 because it automatically 
refs/derefs stuff so that things Just Work.  That's good, because you 
shouldn't have to be referencing arrays yourself (hence my point 
above about an array conceptually being a single lump).  But if we're 
going to hide the [implementational] distinction in some places, we 
should hide it everywhere.

Actually, my point isn't even about arrays per se; that's just the 
implementation/practical side of it.  You can refer to a scalar 
constant too:
	perl -e 'print \1, \1'
	SCALAR(0x8104980)SCALAR(0x810806c)

They're different because the *references* are different, but I don't 
care about that.  A reference to a constant value is kind of 
pointless, because the value isn't going to change.  References to 
*variables* are useful, because you never know what value that 
variable might have, and refs give you a pointer to the current value 
of the variable at any time.

The fact that it's even possible to take a reference to a literal is 
kind of funny, really; but since in P5 you had to be explicit about 
(de)referencing, it didn't hurt, and you could maybe even find some 
cute ways to take advantage of it (such as an easy way to get unique 
IDs out of the str/numification of a ref?).  P6 just lets you gloss 
over certain ref/deref distinctions that in a perfect world wouldn't 
have existed in the first place.

Leibniz's "identity of indiscernibles" is a perfectly practical 
principle to pursue in programming.  Now \@orig may be discernible 
from [@orig] or [1, @orig] from [1, @other], but \1 is completely the 
same as \1 in all ways -- all ways except for being able to get a 
representation of its memory location.  And that's not anything about 
"1", that's a bit of metadata about the reference itself -- something 
that definitely is based on the implementation.

(I can imagine some other implementation where in a ridiculous 
attempt to optimise for minimal memory footprint, everything with a 
value of 1 points to the same address.  When I say "$a=1; $a++", $a 
first points to 0x1234567, and when I increment it, I don't change 
the bits in that location, instead $a changes to point to address 
0x3456789, where my unique 2 value is stored.  Then the only way to 
differentiate \1 from \1 is to generate some arbitrary unique ID. 
Which would be silly.)

Anyway, I hope I'm making sense about why \1 !=== \1, etc. seems a 
bit unnatural to me.


-David

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About