develooper Front page | perl.perl5.porters | Postings from September 2009

dual pivot quicksort

Thread Next
From:
Nicholas Clark
Date:
September 25, 2009 07:58
Subject:
dual pivot quicksort
Message ID:
20090925145827.GG60303@plum.flirble.org
John said to forward his comments about dual pivot quicksort to the list, if
I thought they were interesting. *I* thought that they were interesting :-)

Nicholas Clark

----- Forwarded message from "John P. Linderman" <jpl@research.att.com> -----

Envelope-to: nick@ccl4.org
Delivery-date: Fri, 25 Sep 2009 14:56:30 +0100
X-Mailer: exmh version 2.7.2 01/07/2005 (debian 1:2.7.2-16) with nmh-1.2
Subject: Re: dual pivot quicksort 
In-reply-to: <20090925125200.GE60303@plum.flirble.org> 
Comments: In-reply-to Nicholas Clark <nick@ccl4.org>
   message dated "Fri, 25 Sep 2009 13:52:00 +0100."
Date: Fri, 25 Sep 2009 09:56:04 -0400
From: "John P. Linderman" <jpl@research.att.com>


> I'm a bit confused whom we thank for Perl's current sort implementation.

You can thank Tom for good parts of the quicksort implementation,
Peter McIlroy for the good parts of the optimistic merge sort
implementation, and me for any damage done to either.

> However, as dual pivot quicksort appears to be the new black, I wondered if
> either of you would consider it fun and rewarding to try to beat the current
> implementation.

Thanks!  I hadn't been keeping abreast of sorting theory.

If I read 1) http://gdtoolbox.com/DualPivotQuicksort.pdf right,
the improvements are solely in swaps, not in comparisons.
20% isn't chump change, but I'll wager that in perl, comparisons
totally dominate the sort time (doesn't take long to swap
a couple pointers relative to the time it takes to compare
whatever is at them).  That may not be true for the timings
is see in places like near the bottom of 2)

http://www.reddit.com/r/programming/comments/9jj5z/vladimir_yaroslavskiys_dualpivot_quicksort/

where native integer comparison may be going on.
(It's hard to determine what's really being timed).

With modern architectures, locality and pipelining
can matter a bunch on real performance.  Somebody 
in 2) observed that comparison is read-only, so it
doesn't invalidate caches.  But it can break pipelining.
And, if you are comparing *and* swapping native integers,
you get huge locality wins, because everything you need
is in the partition you are sorting.  If you are sorting
pointers to things, not things, you don't get that huge
locality win.

If I had to guess, I'd bet the dual-pivot quicksort might
show some improvement over Tom's, but nothing like 20%,
because swaps don't matter as much as comparisons in perl.
And I bet the results would vary with architecture, with
improvements in some corresponding to slowdowns in others
(like the timings in 2).

Plus, of course, the dual-pivot is not stable.  We've
promised a stable implementation.  If you (or at least I)
try to stabilize quicksort, you eliminate the appearance
of any duplicated items, which, according to the timings
at the bottom of 2), are where quicksort gets the big
edge over random data.

So I won't be rushing to modify the quicksort implementation,
but if anybody else thinks that would be fun or rewarding,
I would certainly encourage it.  Should at least be worth a
paper.  -- jpl


----- End forwarded message -----

Thread Next


nntp.perl.org: Perl Programming lists via nntp and http.
Comments to Ask Bjørn Hansen at ask@perl.org | Group listing | About