develooper Front page | perl.perl5.porters | Postings from June 2004

Re: optimizing Short circuiting sort method based on [M]

From:
John P. Linderman
Date:
June 30, 2004 11:34
Subject:
Re: optimizing Short circuiting sort method based on [M]
Message ID:
200406301834.OAA87163@raptor.research.att.com
Paul Fenwick, from down under, noted:

> I haven't thought about any other sorting algorithms yet.  I agree that
> heapsort is definitely worth examining as an efficient way to gathering
> only the first sorted M elements, as at each stage the next element is
> known (the root), and there are only two choices for the next element
> (the immediate children of the root).

I don't hold out much hope for heap.  It's still a log M deep tree,
and now there are *two* comparisons at each level to sift elements
up and down.  Better, I believe, to stick with a simple insertion
sort (which is stable, too).  With binary search, you're never
stuck with more than N log M comparisons, which is where the real
action is... the N * M moves can be tolerated more readily.

If you want to be adventurous, you can keep track of where the new
elements are inserted...  if they tend to be inserted near the end
(or not inserted at all), you can risk a comparison with the last
element (or two) before doing the binary search.  For mostly
ordered data, you'd get O(N) behavior on both comparisons and moves,
which as good as you can do.  For data in reverse order, you'd
settle in on the binary search and stay there, and for random data,
I suspect you'd start out doing binary, then maybe switch to linear
towards the end, as the maximal elements find their way into the
list.  No doubt depends on how you keep track of how many recent
instertions go where.  -- jpl



nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About