From:

Date:

September 11, 2013 16:01Subject:

unstable mergesort?Message ID:

CAC0cEp_nXAUqcu3SsCgMWeF=jCjqmDDW1kREcv=AKGcwv30vjA@mail.gmail.comWriting my opinion piece about timsort See Re: [perl #119635] deprecate and remove qsort got me thinking about the cost of stability. Consider the "stuttering countdown sequence" 100 99 99 98 98 97 97 ... 3 3 2 2 1 It's decreasing, so if we were free to ignore stability, we could knock it off in N-1 comparisons. But stability obliges us to consider the first 99 as strictly less than the second 99, and so on. We get 99 runs of length 2 (and, for those who waded through my analysis, we get a kind of worst-case initial setup for our mergesort. Every PTHRESH (8) pairs, we say, wow, 8 pairs in a row in descending order, this sure looks like it could be a long descending run. So we risk a compare to see if we can bridge the last two pairs, and find we cannot. So we start over, and find 8 more pairs with the same sense, and risk another compare, and get disappointed again. Every 16 elements, we make a comparison that doesn't pay off. 1/16th worse than if we hadn't tried to find long runs. But that's not so awful.) Merging those runs of length 2 takes us more like NlgN comparisons rather than N-1. And if were really just sorting integers or floats, that would be rather sad, because there's no way of looking at the final result and knowing which 99 was which (more about this later). As I mentioned in the analysis, when we compare two equal elements, a and b, using cmp(a,b) then we can regard a as low if a preceded b in the input sequence, but, if we want stability, we must regard a as greater than b if a came later in the sequence than b. In the merge phase, we'll be comparing the same a against assorted b's, all of which either came before it, or all of which come later. So we do something like if ( a comes before all the b's ) sense = 0; else sense = -1; and then do our compares as if (cmp(a,b) <= sense) treat a as low else treat b as low This "does the right thing" for all a and b, whether they are equal or not. But it occurs to me that we could "ignore stability" by doing if (stable) sensestab = -1; else sensestab = 0; at the start of the sort, and then replace the "else sense = -1;" above with else sense = sensestab; Our comparison logic would then treat a as low to equal b, regardless of which came first. It looks like things will be a little less trivial in the identification of initial runs phase, but probably not too bad. After which no stable; would allow us to ignore stability, potentially to considerable gain. In fact, one might be tempted to make it the default if the comparison routine were { $a <=> $b }. Alas, we cannot, because given input like qw( 2b 2a 1b 1a ), numerical comparison will work on the leading numerical part, but carry along the string form as well, so you *can* tell which 2 is which. But we can change the documentation (which we must anyway) to emphasize that most numerical sorts can benefit from not enforcing stability. If this gets enough +1s, then if someone points me at the documentation for creating a git branch, I'll take a (leisurely) whack at making stability optional.Thread Next

**unstable mergesort?**by John P. Linderman- Re: unstable mergesort? by Aristotle Pagaltzis
- Re: unstable mergesort? by Father Chrysostomos
- Re: unstable mergesort? by Steffen Mueller
- Re: unstable mergesort? by Tim Jenness
- Re: unstable mergesort? by Aaron Crane
- Re: unstable mergesort? by Nicholas Clark
- Re: unstable mergesort? by John P. Linderman
- Re: unstable mergesort? by Steffen Mueller

nntp.perl.org: Perl Programming lists via nntp and http.

Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About