develooper Front page | perl.perl5.porters | Postings from June 2004

Re: optimizing Short circuiting sort method based on [M]

From:
Glenn Linderman
Date:
June 30, 2004 14:08
Subject:
Re: optimizing Short circuiting sort method based on [M]
Message ID:
40E32BD2.3060201@nevcal.com
On approximately 6/30/2004 11:58 AM, came the following characters from
the keyboard of Nicholas Clark:

> On Wed, Jun 30, 2004 at 11:42:44AM -0700, Glenn Linderman wrote:
> 
>>Thanks for describing how one would do this with a modified merge sort. 
>> I agree that there would be some benefits to using merge sort, because 
>>of perl already using mergesort, but I'm uncertain that any of the code 
>>could be reused, without slowing down the "full sort", which wouldn't be 
>>a brilliant thing to do.
> 
> Indeed not. Slowing down the general case to benefit a special case isn't
> going to get in. And I've see no evidence to counter my intuition that this
> case of top M is anything but rare.

I've certainly written plenty of "selection" loops to determine the 
smallest or largest item in a group.  That's "top 1".

Having it in C code instead of perl code would certainly be faster. 
"Top M", if done correctly, could produce "top 1" in an efficient manner.

There are certainly lots of "Top 10" lists that float around the 
internet, but I don't think they are done by sorting LOL  But programs 
like "top", and a variety of statistical analyses will report the "top 
M" regions by sales, the "top M" salesmen, etc.  Perhaps the full 
ranking is desired, but often not.

So if I can think of a couple uses off the top of my head, it could be 
that it is not as rare as you think.  Or maybe it is.

I can also imagine code like

   while ( sort @array )
   {
      ...
      last  if ... ;
      ...
   }

for large @array, I speculate that it might often be more efficient to 
use an incremental sort than to do a full sort, and only use a fraction 
of the results.  Of course, it depends on the likelihood of using a 
small fraction, so probably in cases like this a user hint should 
determine whether to use a full sort, or an incremental sort.

And perhaps "incrementalsort" would be a good explicit name to give to 
it... it would be the explicit user hint to use bubble or heapsort or 
whatever might be best (based on the size of the array).

And how does this fit into the "top M" discussion?  Well, if 
incrementalsort were available, the user could explicitly select it when 
want desiring 1 or M elements from an array, and code something like:

$smallest = incrementalsort @array;
$largest = reverse incrementalsort @array;
( $smallest, $nextsmallest ) = incrementalsort @array;
@smallest_seven[ 0 .. 6 ] = incrementalsort @array;
@top_ton[ 0 .. 9 ] = reverse incrementalsort @array;


> Does the peephole optimiser currently have any way to optimise
> 
>   @a = reverse sort @b
> 
> into
> 
>   @a = sort {$b cmp $a} $b;
> 
> but without the explicit sort comparison?

Dave says it shouldn't be too difficult, and it sounds useful.  Similar 
optimizations should be done for "reverse incrementalsort" also :)  (per 
the above examples)

Here's a question for you: does perl's optimizer use bubble sort (or a 
variant) for sorting arrays of sufficiently small size, or does it use 
mergesort for everything?  For small N, bubble sort is faster than 
mergesort...  A related question is, what is the underlying sort 
technique used by perl's mergesort for sorting a single run?

-- 
Glenn -- http://nevcal.com/
===========================
The best part about procrastination is that you are never bored,
because you have all kinds of things that you should be doing.




nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About