On approximately 6/30/2004 11:58 AM, came the following characters from the keyboard of Nicholas Clark: > On Wed, Jun 30, 2004 at 11:42:44AM -0700, Glenn Linderman wrote: > >>Thanks for describing how one would do this with a modified merge sort. >> I agree that there would be some benefits to using merge sort, because >>of perl already using mergesort, but I'm uncertain that any of the code >>could be reused, without slowing down the "full sort", which wouldn't be >>a brilliant thing to do. > > Indeed not. Slowing down the general case to benefit a special case isn't > going to get in. And I've see no evidence to counter my intuition that this > case of top M is anything but rare. I've certainly written plenty of "selection" loops to determine the smallest or largest item in a group. That's "top 1". Having it in C code instead of perl code would certainly be faster. "Top M", if done correctly, could produce "top 1" in an efficient manner. There are certainly lots of "Top 10" lists that float around the internet, but I don't think they are done by sorting LOL But programs like "top", and a variety of statistical analyses will report the "top M" regions by sales, the "top M" salesmen, etc. Perhaps the full ranking is desired, but often not. So if I can think of a couple uses off the top of my head, it could be that it is not as rare as you think. Or maybe it is. I can also imagine code like while ( sort @array ) { ... last if ... ; ... } for large @array, I speculate that it might often be more efficient to use an incremental sort than to do a full sort, and only use a fraction of the results. Of course, it depends on the likelihood of using a small fraction, so probably in cases like this a user hint should determine whether to use a full sort, or an incremental sort. And perhaps "incrementalsort" would be a good explicit name to give to it... it would be the explicit user hint to use bubble or heapsort or whatever might be best (based on the size of the array). And how does this fit into the "top M" discussion? Well, if incrementalsort were available, the user could explicitly select it when want desiring 1 or M elements from an array, and code something like: $smallest = incrementalsort @array; $largest = reverse incrementalsort @array; ( $smallest, $nextsmallest ) = incrementalsort @array; @smallest_seven[ 0 .. 6 ] = incrementalsort @array; @top_ton[ 0 .. 9 ] = reverse incrementalsort @array; > Does the peephole optimiser currently have any way to optimise > > @a = reverse sort @b > > into > > @a = sort {$b cmp $a} $b; > > but without the explicit sort comparison? Dave says it shouldn't be too difficult, and it sounds useful. Similar optimizations should be done for "reverse incrementalsort" also :) (per the above examples) Here's a question for you: does perl's optimizer use bubble sort (or a variant) for sorting arrays of sufficiently small size, or does it use mergesort for everything? For small N, bubble sort is faster than mergesort... A related question is, what is the underlying sort technique used by perl's mergesort for sorting a single run? -- Glenn -- http://nevcal.com/ =========================== The best part about procrastination is that you are never bored, because you have all kinds of things that you should be doing.