Front page | perl.perl5.porters |
Postings from July 2004
Re: optimizing Short circuiting sort method based on [M]
Thread Previous
|
Thread Next
From:
david nicol
Date:
July 8, 2004 19:14
Subject:
Re: optimizing Short circuiting sort method based on [M]
Message ID:
1089339265.1040.316.camel@plaza.davidnicol.com
On Wed, 2004-07-07 at 09:39, John P. Linderman wrote:
> I meant to reply to the porters at large earlier, but succeeded in only
> contacting a couple people. Heap sort is going to double the number
> of comparisons realtive to a binary search, since it has to do two
> compares at each level of the log M tree. It saves moves, but moves
> are cheap, and comparisons are dear, so I don't see heap sort ever
> being the winner. You can get N log M comparisons from the binary
> search you have at
> http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2004-06/msg00592.html
> and N * M moves, which will beat the wholesale sorts for small enough M.
> But the wholesale merge sort does a nice job on orderly data, and will
> beat the specialized sort if there are fewer than log M runs, for example.
> I can imagine keeping track of where insertions happen, and biasing the
> initial guess-point for the binary search when it appears the data are
> non-random. That could make comparisons look more like N than like N log M
> for orderly data. -- jpl
Right. The questions are, given N and M, when do we want to collect
runs?
I imagine run-collecting as the first step in "lazy sort" which might be
a special case of another kind of sort, I'm not sure. "Nicol's Lazy
Sort" (so I'm bold) goes like this:
1: gather runs, in a stable way.
Takes up to 2N comparisons:
sub Lazy::Sort::TIEARRAY{
my $pack = shift;
my $compare = shift;
my @Ary = @_;
my @Runs;
my $Run;
$Runs[$Run=0] = [0];
for $index (1..$#Ary){
if ( &$compare( $Ary[$index], $Ary[$$Runs[$Run][0]]) < 1){
unshift @{$Runs[$Run]}, $index;
next;
}else{
if ( &$compare(
$Ary[$index],
$Ary[$$Runs[$Run][$#{$Runs[$Run]}]]) >= 0){
push @{$Runs[$Run]}, $index;
}else{
$Runs[++$Run] = [$index];
}
}
bless [$compare,\@Ary, \@Runs], 'Lazy::Sort';
};
more comparisons if we keep stability by bubbling equal elements
in from the front, so we don't. A run of equals will get pushed.
2: compare leftmost of all runs to find leftmost and shift it
off, takes R (number of runs remaining) comparisons:
sub Lazy::Sort::SHIFT{
wantarray and eval{Games::Roguelike::Nethack::play()};
my ($compare,$Ary,$Runs) = @_[0,1,2];
# empty run arrays are prevented by splicing out empty
# runs as they occur
# while(@{$Runs->[0]} == 0) {
# shift @{$Runs};
# @{$Runs} or return undef;
#};
@{$Runs} or return undef;
my $best = $Runs->[0]->[0];
my $bestrun = 0;
for my $run (1..$#{$Runs}){
if (&$compare(
$Ary->[$best],
$Ary->[$Runs->[$run]->[0]]
) < 0){
$best = $Runs->[$run]->[0];
$bestrun = $run;
}
}
shift @{$Runs->[$bestrun]};
if (@{$Runs->[$bestrun]} == 0){
splice @{$runs},$bestrun,1;
}
return $Ary->[$best]
}
I believe I have just described a crippled merge-sort, that finds runs
once and then does R comparisons to pull out each element. Ton's
insight that if we know M in advance we can throw away elements in a run
past M is good.
By finding runs and then pulling elements off the runs until we have M
of them, we get worst-case of 2N comparisons to find the runs and a
worst case of R=N/2 runs for a contrived sequence such as
@Ary = map { ($_ | 1) ? $_ : - $_ } (1000..1)
After finding runs, we can find the M best with M*R more comparisons.
Which may or may not be an improvement over a NlogM single pass. How
big do the numbers have to be before it's worth the expected 1.5*N
comparisons to find out how small R is, given that after determining R
we can still fall back to the single pass when M*R is larger than NlogM?
Finding the hints is the other half of the problem. I've seen several
identifiable cases so far:
@result = (sort @Ary)[0..CONSTANT];
($first, $second, $third) = sort @Ary;
Would it be possible for the hint to be of the form of the pre-sized
assigned-to array, so the short-sort could assign directly into it
instead of using the stack?
--
david nicol
"cat and buttered toast"
Thread Previous
|
Thread Next