On approximately 6/29/2004 3:10 PM, came the following characters from the keyboard of david nicol: > On Fri, 2004-06-25 at 10:55, Glenn Linderman wrote: > >>One trivial way to take advantage of the hint is to recall that the >>lowly N-squared bubble sort (and variations) produces one high (or low) >>value per pass, and that for small M and large N, that M*N is smaller >>than N lg N. > > With regard to yesterday's post in which I'm thinking aloud, > > Considering bubbling or inserting the new, smaller element, based > on expected average-case number of comparisons required, I think that > bubbling will be better for M less than or equal to five. > > The reason is, binary search _always_ has to do lg(M-1) comparisons > while bubbling can do anywhere from one to M-1 comparisons because > bubbling stops when the new element has done like a mature executive and > risen to the point where it can't rise any further, an expected number > of (M-1)/2 comparisons. > > (M-1)/2 is equal to lg2(M-1) at M=5. I'm not sure what you are calculating here? The cost of inserting the data into the short list? I'm not sure that is the most interesting part of the task. Since M is small, whether entries are placed by bubble or insertion sort is somewhat ho-hum. In fact, depending on how you generate the M elements, it may be that there is no work to sort them... they might get handed to you sorted. On the other hand, maybe I'm missing a key unstated assumption that would make me understand what you are talking about. So I find the task of obtaining the correct M elements from the N elements to be the more interesting task. Depending on where the M elements are stored.... if already in a "temporary array", then they could be sorted "in place", and after M passes of the bubble sort, the first (or last, depending on how it is coded) M entries are sorted. This takes (N-1) + ... + (N-M) comparison/swap operations. If you sort the whole array, M approaches N, and the complete cost of sorting is O(N**2/2) IIRC. But after M passes, the short array is just a slice of the partially sorted large array. If it is not already in a temporary array, there might be a copy cost to put it in a temporary array, unless the result is to be stored replacing the original array, in which case we can do the same in-place sort. With something like a heap sort, the highest/lowest elements can also be generated in order. It being 25 years or more since I last analyzed the costs of heap sort, I'm not going to attempt to regurgitate them in this email. IIRC the heap sort is a better sort choice than quicksort if you only want a few of the top/bottom few entries.... IIRC both are o(N lg N), but for complete sorting, quicksort has a lower coefficient than heapsort. I'm not sure offhand how many result elements are at the crossover between where bubble sort is best and heapsort becomes best, but I believe both of those will generate entries in sorted order faster than sorting the whole array to throw away most of the results. As an alternative to the copy cost above, I posit an algorithm I'll call scan-select (maybe someone has a name for it) where the original data isn't modified, but on the first pass you go through all N entries and look for the highest/lowest number, and keep track of where it was, and select it. Then on each subsequent scan you go through all the data that hasn't already been selected, and look for the highest/lowest number, and select it, until you have accumulated M entries. This would have a similar cost to the bubble sort, but higher overhead to avoid previously selected entries. So that higher overhead has to be traded off against the cost of a copy + partial bubble sort or a copy + partial heap sort. -- Glenn -- http://nevcal.com/ =========================== The best part about procrastination is that you are never bored, because you have all kinds of things that you should be doing.