Front page | perl.perl5.porters | Postings from July 2004

## Re: optimizing Short circuiting sort method based on [M]

From:
Glenn Linderman
Date:
July 9, 2004 11:46
Subject:
Re: optimizing Short circuiting sort method based on [M]
Message ID:
40EEE7F4.3060403@nevcal.com
```On approximately 7/9/2004 7:19 AM, came the following characters from
the keyboard of Ton Hospel:
> In article <1089335013.1041.182.camel@plaza.davidnicol.com>,
> 	david nicol <davidnicol@pay2send.com> writes:
>
>>Does not the combining involve more comparisons?
>
>
> Sure, but I accounted for them. N/M combines each collecting M elements
> from 2 runs of M => M compares per combine => O(N) compares, which then
> gets subsumed in the O(N log M) we already had from the first step

I appreciated as interesting the idea that a "top N sort" might
appropriately be different than the "incremental sort of unknown result
elements".

To get a feeling for how different, it would be interesting to compare
the costs of obtaining each element in turn for the incremental sort,
vs. the costs of obtaining the top N.

And we've only seen a discussion of discarding elements (and
corresponding costs) applied to merge sort, not to other possible
sorts... perhaps the others don't lend themselves to discarding elements
as well as merge sort does.

But then unless the total cost of obtaining the top N elements in
significantly less for a top N sort than for the incremental sort,
perhaps only the incremental sort need be implemented.

I look forward to David's Sort::Lazy module; perhaps it would be a
suitable framework for investigating some of these sort characteristics
empirically?  And also providing a framework for implementing and
measuring both the theoretical costs O(N,M) and actual costs (for
particular data sets).

--
Glenn -- http://nevcal.com/
===========================
The best part about procrastination is that you are never bored,
because you have all kinds of things that you should be doing.

```

nntp.perl.org: Perl Programming lists via nntp and http.