develooper Front page | perl.perl5.porters | Postings from September 2013

Re: unstable mergesort?

Thread Previous | Thread Next
From:
Nicholas Clark
Date:
September 12, 2013 14:01
Subject:
Re: unstable mergesort?
Message ID:
20130912140141.GC66035@plum.flirble.org
On Wed, Sep 11, 2013 at 12:01:39PM -0400, John P. Linderman wrote:

> This "does the right thing" for all a and b, whether they are equal or
> not.  But it occurs to me that we could "ignore stability" by doing
> 
> if (stable) sensestab = -1;
> else sensestab = 0;
> 
> at the start of the sort, and then replace the "else sense = -1;" above with
> 
> else sense = sensestab;
> 
> Our comparison logic would then treat a as low to equal b, regardless of
> which came first.  It looks like things will be a little less trivial in
> the identification of initial runs phase, but probably not too bad.  After
> which

If I understand your description correctly, it would be possible to do this
within the body of S_mergesortsv() and support functions, so that the same
source code is used for both the stable and unstable versions?

> no stable;

I think you meant to type no sort "stable";

> If this gets enough +1s, then if someone points me at the documentation for
> creating a git branch, I'll take a (leisurely) whack at making stability
> optional.

If you think that this would be something you'd find fun, I'd love to see
the results.

On Wed, Sep 11, 2013 at 11:10:10PM +0100, Aaron Crane wrote:
> Father Chrysostomos <sprout@cpan.org> wrote:
> > We already document that stability is not guaranteed.  If this will
> > speed things up, I suggest making it the default.
> >
> > We also already have 'use sort "stable"', which I thought actually did
> > something, but what you have written suggests it does not.

It does affect qsort:

$ perl -le 'use sort "_qsort"; use sort "stable"; print foreach sort {$a <=> $b} map { "2$_" } aa..jz' | tail
2jq
2jr
2js
2jt
2ju
2jv
2jw
2jx
2jy
2jz
$ perl -le 'use sort "_qsort"; print foreach sort {$a <=> $b} map { "2$_" } aa..jz' | tail
2ic
2ew
2ak
2gb
2df
2cu
2ha
2ee
2fn
2ex


I haven't fully untangled the history between 5.6.0 and 5.8.0, but I think
that when the sort pragma was added by commit 84d4ea48280f6b54, qsort
(unstable) was believed to be a bit faster than mergesort (stable), so the
assumption was that you might want to say C<no sort "stable"> to get a bit
more speed if you were confident that you weren't relying on stability.

But by the time 5.8.0 escaped to CPAN, John P. Linderman had improved the
mergesort sufficiently that it was as fast as qsort pretty much all the time:

http://www.xray.mpe.mpg.de/mailing-lists/perl5-porters/2001-12/msg00931.html

At which point there was very little gain for any program(mer) to actually
specific "unstable".

> As it happens, I was thinking about this a few days ago, before this
> discussion kicked off.  I'm somewhat of the view that stability is a
> better "don't make me think" default, in the sense that if you don't
> need it but get it anyway, your code works correctly; whereas if you
> do need it but don't get it, your code doesn't.  (This may be a
> problem with the idea of changing the default; a CPAN smoke would
> presumably give us an idea of the size of that problem.)  For that
> reason, I was on the verge of (but sadly didn't get round to)
> proposing a doc patch saying that we guarantee a stable sort except
> when you explicitly ask for an unstable one.
> 
> Even if the consensus is in favour of switching the default to
> (faster) instability, I think we should document what the default is.
> Currently we have the situation that everyone gets a stable sort
> (unless they ask for an unstable one) but the default might change out
> from under you, so if you need stability, you have to ask for it
> anyway.  It would be annoying to replicate that problem in the other
> direction.

In the short term, if the incremental cost of stability is small, I don't
think that it's worth giving up.

For the bigger picture, we've been defaulting to stability for about a
decade. In that decade, I'm not aware of anyone coming up with a radically
faster algorithm that can't guarantee stability, so I don't think it likely
that anyone will in the next ten years. Plus we can always turn an unstable
sort stable by using an array of pointers, like we currently do for qsort,
for a cost that is small.

So I think we should document sort as being stable by default, and that
we're not going to change this.

Nicholas Clark

Thread Previous | Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About