 Front page | perl.golf | Postings from July 2002

## Re: My solution annotated

From:
Chris Dolan
Date:
July 9, 2002 12:05
Subject:
Re: My solution annotated
Message ID:
3D2B3355.4000109@clotho.com
```My favorite solution was by Jukka Suomela at 97.47.  No disrespect
intended toward Ton et al., but I think Jukka's approach is the most
clever and interesting in the competition.  My explanation is below.

-alp
(\$_,my\$y)=map{bless\$x{\$_}||=[\$_]}@F;\$_-\$y&&push@\$_,\$y}DESTROY{\$d?dump:print\$_}{\$d=%x=q

The principle behind this solution is that Perl already has a
topological sorter for graphs built in: the garbage collector.  The GC
takes heirarchical data structures and deletes them by findingthe parts
that are referred to by no other parts and deleting them, and then
recursing (literally or virtually, I have no idea) to other parts of the
data structure.  The GC must destroy the object in the proper order to
preserve the integrity of the data structure at each stage.  Happily,
this order is the topological order.

Jukka's solution creates a nested data structure and forces the GC to
take it apart, and print output along the way.  There are three parts to
the solution:

1) Construction

-alp (\$_,my\$y)=map{bless\$x{\$_}||=[\$_]}@F;\$_-\$y&&push@\$_,\$y}

Loop over the input lines, using the autosplit to create an array of the
leading node (\$F) and the trailing node (\$F) for each edge.  Every
node becomes an array reference containing (1) the name of the node
itself and (2) a list of child nodes.  The former is added as soon as a
the new node is discovered, while the latter are pushed onto the array
as they are encountered.  The "\$_-\$y" compares the nodes to make sure
they are different.  If they are the same, the push() does not happen.
This removes the isolated node problem.

The bless() call is crucial for the deconstruction phase described below.

The "my" is needed to make \$y be a new scalar every time (otherwise the
scalar is reused and \$_-\$y doesn't work right).  \$_ is already local in
scope, so there is no need to use a "my" with that node.

At the end of the input loop, %x refers to all of the nodes, with all of
their interconnections implied by the nested references.  In the example
of test #1
a b
c d
b c
we get (via Data::Dumper)
%x = (
'a' => bless( [
'a',
bless( [
'b',
bless( [
'c',
bless( [
'd'
], 'main' )
], 'main' )
], 'main' )
], 'main' ),
'b' => \$VAR1->{'a'},
'c' => \$VAR1->{'a'},
'd' => \$VAR1->{'a'}
);

In this data structure, one can already see that the nodes are sorted.

The -p will not actually be used to print, but it is crucial to shorten
the solution a bit, as described below.

2) Resolution

{\$d=%x=q

First, the "q" is a clever hack to shorten the entry by one character.
All that is needed is to set %x to any non-false scalar, so something
like \$. could be used.  But that's two strokes.  To shave one stroke,
Jukka used the q() operator, with the newline as the enclosing
character.  Thus, deparsed with the -p option, this looks like:

{ \$d = %x = '}continue{print or die qq(-p destination: \$!\\n)'; }

which is, of course, a non-false scalar.  Thus, the expansion of -p is
an arbitrary string.

%x=q causes the %x hash to be destroyed and garbage collected.  All
collectable nodes will be destroyed.  Any circular references will NOT
be destroyed.  Upon completion of the garbage collection process, the
previously-undefined variable \$d is set.  This variable is a flag to
indicate whether or not the garbage collection SHOULD have completed or
not.  If the graph is acyclic, all the nodes should be collected before
\$d is set.  If there are any cycles, the cyclic nodes will be garbage
collected (upon termination of the program, I think) after \$d gets set.

3) Deconstruction

DESTROY{\$d?dump:print\$_}

Because the nodes are blessed, each time one is garbage collected, the
DESTROY function in the current namespace is called.  Jukka overrides
the default no-op DESTROY function with one that does useful work for
this problem.  The node to be destroyed is passes in via @_, so that
node is \$_.  The first step is to check if we are in the resolution
phase (garbage collecting acyclic nodes) or in the program termination
phase (garbage collecting cyclic nodes).  As described above, \$d is the
flag that indicates the phase.  If we are in the latter phase, we call
dump() which immediately terminates the program and dumps core.  We must
use dump() since it is too late to call die() or other constructs while
the program is in the process of terminating.

If we are still in the acyclic phase, then print the first element of
the array that is contained in this node.  Since the node is \$_, the
first array element is \$_, which is the name of the node.

Thus, the de-obfuscated solution might look like:

%allnodes = ();

while (\$line = <ARGV>) {
chomp \$line;
@labels = split(/\s+/, \$line);

foreach my \$label (@labels) {
if (!exists \$allnodes{\$label}) {
\$allnodes{\$label} = bless [\$label];
}
}
my \$node1 = \$allnodes{\$labels};
my \$node2 = \$allnodes{\$labels};

if (\$node1 != \$node2) {
push @\$node1, \$node2;
}
}

%allnodes = 'any old non-false value';
\$end_of_program = 1;

sub DESTROY {
if (\$end_of_program) {
dump;
} else {
my (\$node) = @_;
print \$node->, "\n";
}
}

Chris

```