develooper Front page | perl.perl6.language | Postings from February 2004

The Sort Problem: a definitive ruling

Thread Previous | Thread Next
From:
Damian Conway
Date:
February 19, 2004 17:56
Subject:
The Sort Problem: a definitive ruling
Message ID:
403562DF.9020904@conway.org
The design team discussed "The Sort Problem" during yesterday's 
teleconference. Here is Larry's decision: final, definitive, and unalterable 
(well...for this week at least ;-)

-----cut---------cut---------cut---------cut---------cut---------cut----

C<sort> in Perl6 is a global multisub:

     multi sub *sort(Criterion @by: *@data) {...}
     multi sub *sort(Criterion $by: *@data) {...}
     multi sub *sort(             : *@data) {...}

where:

     type KeyExtractor ::= Code(Any) returns Any;

     type Comparator   ::= Code(Any, Any) returns Int;

     type Criterion    ::= KeyExtractor
                         | Comparator
                         | Pair(KeyExtractor, Comparator)
                         ;

That means that we can call C<sort> without a block (to sort stringifically 
ascending with C<cmp>):

     # Stringifically ascending...
     @sorted = sort @unsorted;

or with a single two-argument block/closure (to sort by whatever the specified 
comparator is):

     # Numerically ascending...
     @sorted = sort {$^a <=> $^b} @unsorted;

     # Namewise stringifically descending case-insensitive...
     @sorted = sort {lc $^b.name cmp lc $^a.name}
                    @unsorted;
     # or...
     @sorted = sort {$^b.name cmp $^a.name} is insensitive
                    @unsorted;
     # or...
     @sorted = sort {$^a.name cmp $^b.name} is descending is insensitive
                    @unsorted;

     # Modtimewise numerically ascending...
     @sorted = sort {-M $^a <=> -M $^b} @unsorted;

     # Fuzz-ifically...
     sub fuzzy_cmp($x, $y) returns Int;
     @sorted = sort &fuzzy_cmp, @unsorted;


or with a single one-argument block/closure (to sort according whatever the 
specified key extractor returns):

     # Numerically ascending...
     @sorted = sort {+ $^elem} @unsorted;
     @sorted = sort {+ $_} @unsorted;

     # Namewise stringifically descending case-insensitive...
     @sorted = sort {~ $^elem.name} is descending is insensitive @unsorted;
     @sorted = sort {lc $^elem.name} is descending @unsorted;
     @sorted = sort {lc .name} is descending @unsorted;

     # Modtimewise numerically ascending...
     @sorted = sort {-M} @unsorted;

     # Key-ifically...
     sub get_key($elem) {...}
     @sorted = sort &get_key, @unsorted;

or with a single extractor/comparator pair (to sort according to the extracted 
key, using the specified comparator):

     # Modtimewise stringifically descending...
     @sorted = sort {-M}=>{$^b cmp $^a} @unsorted;

     # Namewise fuzz-ifically...
     @sorted = sort {.name}=>&fuzzy_cmp @unsorted;

or with an array of comparators and/or key extractors and/or 
extractor-comparator pairs (to sort according to a cascading list of criteria):

     # Numerically ascending
     # or else namewise stringifically descending case-insensitive
     # or else modtimewise numerically ascending
     # or else namewise fuzz-ifically
     # or else fuzz-ifically...
     @sorted = sort [ {+ $^elem},
                      {$^b.name cmp $^a.name} is insensitive,
                      {-M},
                      {.name}=>&fuzzy_cmp,
                      &fuzzy_cmp,
                    ],
                    @unsorted;



If a key-extractor block returns number, then C<< <=> >> is used to compare 
those keys. Otherwise C<cmp> is used. In either case, the keys extracted by 
the block are cached within the call to C<sort>, to optimize subsequent 
comparisons against the same element. That is, a key-extractor block is only 
ever called once for each element being sorted.

If a key-extractor/comparator pair is specified, the key-extractor is the key 
of the pair and the comparator the value. The extractor is used to retreive 
keys, which are then passed to the comparator.

The C<is descending> and C<is insensitive> traits on a key extractor or a 
comparator are detected within the call to C<sort> (or possibly by the 
compiler) and used to modify the case-sensitivity and "direction" of any 
comparison operators used for the corresponding key or in the corresponding 
comparator.


Note that ambiguous cases like:

     @sorted = sort {-M}, {-M}, {-M};
     @sorted = sort {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
     @sorted = sort [...], [...], [...];
     # etc.

will be dispatched according to the normal multiple dispatch semantics
(which will mean that they will mean):

     @sorted = sort {-M}          <== {-M}, {-M};
     @sorted = sort {$^a <=> $^b} <== {$^a <=> $^b}, {$^a <=> $^b};
     @sorted = sort [...]         <== [...], [...];
     # etc.

and so one would need to write:

     @sorted = sort <== {-M}, {-M}, {-M};
     @sorted = sort <== {$^a <=> $^b}, {$^a <=> $^b}, {$^a <=> $^b};
     @sorted = sort <== [...], [...], [...];
     # etc.

to get C<cmp> comparison on all the arguments.

-----cut---------cut---------cut---------cut---------cut---------cut----


Thanks to everyone who contributed to this discussion (especially Uri). As you 
see, the result is sort facility that is simultaneously much more powerful, 
much easier-to-use in the simple cases, has the potential to produce more 
maintainable code, and which has much better scope for internal optimization.

"Once again the Iron Designer rises to the supreme challenge of the 
Mailinglist Stadium and expresses the true spirit of Perl 6!!!"

Zenzai!

Damian

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About