This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
quicksort
- From: Carsten Hey <carsten at debian dot org>
- To: libc-alpha at sourceware dot org
- Date: Tue, 29 Nov 2011 01:00:48 +0100
- Subject: quicksort
Hi,
I created some patches [0][1] against qsort.c, i.e., _quicksort():
68ea995 qsort.c: correctly place the pivot element after partitioning
8aec01b qsort.c: simplify partitioning code
8e77619 qsort.c: minimize number of swaps in median of three pivot selection
dfe08ac qsort.c: add pseudo-median of nine pivot selection
6810136 qsort.c: move equal elements on the left side
a0d2eb3 qsort.c: sort multisets in O(N*k)
3fa9ccc qsort.c: swap long-wise instead of char-wise if possible
b644ca6 qsort.c: update documentation (HEAD, github/master, master)
[0] https://github.com/heyc/glibc/commits/master
[1] git://github.com/heyc/glibc.git
Everything in this patches that would be copyrightable by me (if there
is anything at all) is licensed under the terms of the WTFPL 2.0 [3].
[3] http://sam.zoy.org/wtfpl/
I created a simple program [4] to show that _quicksort() fails to sort
{2,0,0,1,2,0,0}, you just need to add printing the value of the pivot
element and returning after partitioning in qsort.c. The fix in the
first patch actually belongs to the partitioning code introduced in the
following commit, but it seems to work with the current partitioning
code too.
[4] http://paste.debian.net/downloadh/e7d21104
Instead of the 'split end' partitioning used in "Engineering a Sort
Function" I use a scheme based on the work of Justin Peel [5]. He only
published a much simplified version of his way to efficiently sort
multisets, this one is very fast with a very high probability. My
variant guarantees a worst case of O(N*k) (k is the number of distinct
elements) and works in the following way:
* Check if the chosen pivot equals to a previous one by comparing the
element right of the subfile's rightmost element to the new pivot.
This is done whilst selecting the median. If a median of three is
used, it does not require an additional comparison on distinct
elements since they can only be equal if the pivot also equals to the
largest element of the sample (I did consider this optimization to be
only of little value for pseudo-median of nine selection and thus did
not apply it there).
* If the pivot equals to a previous selected pivot, partition with
equal elements on the right side, otherwise partition with equal
elements on the left side.
* After partitioning with equal elements on the right side, all
elements in the right subfile reached their final position (they are
left of the earlier chosen pivot and right of the current pivot) and
thus can be skipped from further sorting.
[5] http://stackoverflow.com/questions/2105737/has-anyone-seen-this-improvement-to-quicksort-before
The main advantage of using this is instead of 'split end' partitioning
is in my opinion that the removed overhead eases implementing possible
future enhancements efficiently:
* Quicksort with multiple pivots (which is patented [6][7] despite
prior art [8]) to reduce the number of needed variable exchanges.
* Handle mostly sorted data specially.
[6] http://www.google.com/patents/about?id=RwCkAAAAEBAJ
[7] http://www.faqs.org/patents/app/20100121848
[8] http://www.angelfire.com/space2/m_kaykobad/publications/J13.pdf
Reducing overhead in the inner loops also improves performance slightly,
but in my tests, this overhead costed only one or two percent on
pseudo-random data if compiled as separate objects. Drawback is that
sequences that contain every element twice are sorted slower without
'split end' partitioning.
I wondered if there is a reason why you are using a hand-coded stack
besides "Sedgewick said so". Recursing to the smallest subfile first
and a hand-coded tail-recursive call (to avoid depending on compilers to
do this) address the general memory problem too and saving less than N/4
function calls can hardly be an compelling argument if at least lg(N!)
calls to the comparison function are needed on distinct data.
A remaining reason for bloating the code with a stack could be keeping
the memory usage at an absolute minimum, if this is desired.
Regards
Carsten