develooper Front page | perl.perl6.language | Postings from February 2005

Junctions, Sets, and Threading.

Thread Next
From:
Rod Adams
Date:
February 22, 2005 01:25
Subject:
Junctions, Sets, and Threading.
Message ID:
421AFA83.5030401@rodadams.net
Definitions
===========

set (n) : A data container that can hold many elements simultaneously.
The order of elements in a set is meaningless. No two elements in a set
may hold the same value.

junction (n) : The combination of several independent values and an
explicit boolean predicate, to form an object that can behave as a
single entity, until evaluation, when threading will occur.

thread (v) : To take a single statement or expression, and execute it
one or more times, with each iteration using a different value or
element from a junction or set that is participating in that expression.
The results of each execution will be recombined in some fashion.

boolean expression (n) : An expression  evaluated in a boolean context.
Some notable boolean expressions:
 1. Any of the comparison operators taken with their operands.
 2. The test expression of an C< if > or C< while > statement.
 3. The distant connection between a C< given > and it's associated
    C< when >s.
 4. Type checking.

implicit threading (n) : Threading that is done on the user's behalf
with no special syntax in the code declaring when and where it is
happening. It is entirely at the implementation's judgment of the given
circumstances when threading will take place.

explicit threading (n) : Threading done for the user's, on his request.
The form of this request may vary, but there is some syntax that tells
the implementation to thread a given section of code.

function (n) : Any subroutine in capable of returning a value. In Perl
6, this includes: operators, subs, methods, multi-methods, and more.



Junctions
=========

The purpose of a junction is to allow for performing several tests at a
given time, with the testing code needing no knowledge of that junctions
are present. While a junction can represent several values at the same
time, such notions as "hold" and "contain" should be avoided, and they
can be very misleading, they betray the fact that a junction is a single
entity. As such, junctions are best thought of as Scalar. (Like other
scalars, it may reference to something much more complex, but that's
beside the point). Therefore the result of a junction evaluation should
be Scalar. (We will see later that this scalar value is in fact
boolean).

But what is a "junction evaluation"? Simply put, it is the results of
threading all values of a junction through an expression and the
recombination of these results back into a scalar, via the logic of the
associated boolean predicate.

Note the fact that it's a _boolean_ predicate. That means that the end
result of any junction evaluation is a single boolean value. Therefore,
the only place that it makes sense to evaluate a junction is in a place
where a boolean value is expected. This brings us to our "Prime Junction
Rule":

Junctions can only be evaluated as part of a boolean expression.
Attempting to evaluate a junction anywhere else is an error.


Some comments about what this does for us.

- Threading in this case is purely implicit. This allows a user to drop
a junction into any place a test parameter is called for. Asking for
explicit threading here makes the concept useless.

- Nesting of junctions is not an issue, since each threaded evaluation
is yet another boolean expression.

- Things like C< for $j .. 10 {...} >, are illegal. They didn't make
much sense in the first place.

- In fact, most code written without junctions in mind will execute
cleanly and as expected, since the result of the threading operation
will be exactly what is expected: boolean.

- As such there should be no limitation as to where a junction can be
stored and passed. Functions will have the ability to explicitly reject
junctions as arguments by typing their parameters with something like
C< all(any(Int, Str), none(Junction)) >. Yes, that's using junctions to
reject junctions.

- There is still a limited case of "Nasty Side Effect Syndrome" that can
be caused by having an assignment, output, or other undesirable type of
function/operator as part of a boolean expression. There will therefore
need to be a "is unautothreadable" trait to be had for functions. All
builtin I/O and assignment functions should have this on. This
protection is shallow. If the call is not in the text of the evaluated
expression, it is not guaranteed to be tested.

- If you wanted to thread a junction in a non-boolean way, you probably
didn't want a junction. You likely either wanted an array or a set, and
some combination of hyper operators. See below.



Boolean Expressions
===================

Some common boolean expressions in context, with valid junction usage:

    if $x == any(1,2,3,4) {...}

    given $string {
      when any(/a .* q/, 'santa')  {...}
      when all(&is_all_uppercase(), /^ \D+ $/) {...}
    }

    if is_prime(sqrt(any(25,14,32))) {...}
    
Note that in the last case, it's the entire boolean expression that's
threaded. The breakdown would look like:

    is_prime(sqrt(any(25,14,32))) --->
     any(is_prime(sqrt(25)),
         is_prime(sqrt(14)),
         is_prime(sqrt(32))
        )
    
The rule for what to thread is to take the smallest enclosing boolean
expression around the junction as possible. This expansion cannot extend
past a single statement.



Sets
====

Despite several surface similarities between a set and a junction, they
are quite different. Sets actually contain other things. Junctions hold
several values at once, and can even hold the same value more than once
at once. Sets are inherently plural in thought process. Junctions are
inherently singular. The operations one wishes to perform on a set are
radically different from those performed on a junction.

Sets are also rather different from arrays. Arrays allow any given value
to be duplicated, imply a certain order that remain, and other such
usage issues. Not to say that an array can't _hold_ a set. In fact I'll
be assuming that an array is used internally a bit later on.

Sets and hashes are an odd pair. Hashes are not a good basis for Sets
due to the restriction that the keys are strings. Even if this is not
the case, Sets do not need the related data value that hashes give you.
But more on hashes and sets later.

So we need a new Set class. We'll give it a constructor called
C< set() > (creative, huh?) which takes a list as input. And we'll
redefine several operators around it:

    my $x = set(1..3);
    my $y = set(1,3,5,7,9);
    my $n = 2;

    $x + $y      # set(1,2,3,5,7,9)
    $x + $n      # set(1,2,3,5,7,9)
    $x * $y      # set(1,3)
    $x - $y      # set(2)
    $x < $y      # $x is a proper subset of $y
    $x <= $y     # $x is a subset of $y
    $x == $y     # $x is the set $y
    $x ~~ $y     # $x is the set $y
    $n < $y      # $n is an element of $y
    $n ~~ $y     # $n is an element of $y
    set()        # empty set
    $x.powerset  # power set of $x
    $x.elems     # count of elements in set $x

(If this list looks familiar, it's because I stole off Damian, and made
several edits.)
In addition, a set evaluated in a list context returns it's members.


The proposed implementation is that of an internal sorted array. Yes,
sorted. This makes the processing time of virtually every set operation
dramatically faster. Membership is O(lg n) with a binary search; unions,
intersections, set difference and others are all O(n).

However, since a Set can contain elements, we need to consider the
sorting function. I don't care how it works, as long as the following
holds:

Objects of the same class are grouped together.
Objects of the same class are ordered by their traditional sort, if
existent. Numbers increasing, Strings alphabetically, etc.
It's recreatable. The same code running on any platform will sort any
combination of the same data the same way.
If the object in question supports a C< .key > method, that is used for
the sort.


In addition to C< ~~ > for membership, we will borrow the {}
subscripters from hashes (not sure about the <> or «» variants). In this
form, a C< $set{$key} > reference will look for the element that matches
the place in the sort identified by $key (in other words, the element
with the matching .key() or value). If something is there, it then
returns C< .value||$_ >, assuming $_ is the element at that place.

This seemingly bizarre set of semantics gives us something very
powerful. One can make a Set of Pairs, which behaves much like a hash,
only with the ability to use any data type as the key.

We can now completely eliminate Hashes and replace them with a Set of
(String => Any) pairs. But we won't. Hashes are a special case. They do
lookups in roughly linear time, not logarithmic. Not to mention they're
just too useful and common as they are to mess with them much.

Note that we can also create a "sparse array" in a similar manner.


Sets will borrow at least the following methods from Arrays:
   grep, map, elems
and these from hashes:
   keys, values, each, delete, exists

Since the set is internally sorted, any iteration of the set is sorted
(by .key||$_) as well.



HyperQuotes
===========

Just to round out any corner cases that the existing hyper operators do
not cover, and to add a clearer way to do some of them, we will allow
any list surrounded with » «, called hyperquotes, to thread the
immediate function, treating each element of the list as a scalar, and
have the expression return a list of the results. Some samples:

    @s = 'item' _ »@x«;

    $x = 'abcdabc';
    @x = split »/a/, /c/, /d/«, $x;
    # (['','bcd','bc'],['ab','dab',''],['abc','abc'])

    @x = func($a, »@y«);

With these and all the other hyper operator, one should be able to easy
perform the equivalent threading operation that they lost with the Prime
Junction Rule.

Since all the »'s and «'s floating around make this explicit threading,
there are no nasty surprises laying around. You never have a scalar
suddenly explode into a list, causing no end of havoc in unsuspecting
code.

These are also infinitely more useful, since you can be guaranteed that
they won't short circuit, and they are in exactly the order you
specified.




So that's how I feel about things. The Prime Junction Rule was the big
revelation I had the other night. The rest of this was the logical
consequence spawned off that single thought to make it a complete idea.

I think that overall in this process I've done the following:
- Kept the real power of Junctions intact.
- Provided fairly strong protection for newbies, without sacrificing
  power.
- Kept Nasty surprises to a minimum.
- Got rid of the need for "half on" features.
- Provided back any power that the Prime Rule removed through sets and
  expanded hyper ops.

-- Rod Adams


Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About