This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug libc/19305] qsort() should return early if (nmemb <= 1)
- From: "cvs-commit at gcc dot gnu.org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sourceware dot org
- Date: Wed, 17 Jan 2018 19:31:29 +0000
- Subject: [Bug libc/19305] qsort() should return early if (nmemb <= 1)
- Auto-submitted: auto-generated
- References: <bug-19305-131@http.sourceware.org/bugzilla/>
https://sourceware.org/bugzilla/show_bug.cgi?id=19305
--- Comment #1 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, azanella/qsort-refactor has been created
at 9241132d1668a61513c4b0856a51024d425940fd (commit)
- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9241132d1668a61513c4b0856a51024d425940fd
commit 9241132d1668a61513c4b0856a51024d425940fd
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 14:24:53 2018 -0200
stdlib: Remove undefined behavior from qsort implementation
Internally qsort is implemented on top of __qsort_r by casting the
function pointer to another type (__compar_fn_t tp __compar_d_fn_t)
and passing a NULL extra argument. Casting function pointer with
different types for subsequent function call is undefined-behaviour
(C11 6.3.2.3):
"[8] A pointer to a function of one type may be converted to a pointer
to a function of another type and back again; the result shall compare
equal to the original pointer. If a converted pointer is used to call
a function whose type is not compatible with the referenced type,
the behavior is undefined."
Also 'compatible' in this case also does not apply according to
6.7.6.3 Function declarators (including prototypes):
"[15] For two function types to be compatible, both shall specify
compatible return types. (146) Moreover, the parameter type lists,
if both are present, shall agree in the number of parameters and
in use of the ellipsis terminator; corresponding parameters shall
have compatible types. [...]"
Although this works on all architectures glibc supports (mostly because
it casts function pointers with similar calling conventions), I think
it is worth to avoid it. This patch fixes it by adding a common
implementation (qsort_common.c) which redefines the function based
on the required types.
For x86_64 (i7-4790K, gcc 7.2.1) shows a slight better performance
for qsort:
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1304 | 1257 | -3.60
4096| 330707 | 302235 | -8.61
32768| 3300210 | 3020728 | -8.47
524288| 65673289 | 59306436 | -9.69
Repeated
nmemb | base | patched | diff
32| 1885 | 1873 | -0.64
4096| 951490 | 904864 | -4.90
32768| 9272366 | 8542801 | -7.87
524288| 183337854 | 168426795 | -8.13
MostlySorted
nmemb | base | patched | diff
32| 1836 | 1776 | -3.27
4096| 758359 | 709937 | -6.39
32768| 7199982 | 6855890 | -4.78
524288| 139242170 | 129385161 | -7.08
Unsorted
nmemb | base | patched | diff
32| 2073 | 1941 | -6.37
4096| 1058383 | 969021 | -8.44
32768| 10310116 | 9462116 | -8.22
524288| 202427388 | 186560908 | -7.84
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1224 | 1205 | -1.55
4096| 336100 | 325554 | -3.14
32768| 3539890 | 3264125 | -7.79
524288| 67268510 | 66107684 | -1.73
Repeated
nmemb | base | patched | diff
32| 2096 | 2118 | 1.05
4096| 1015585 | 979114 | -3.59
32768| 9871981 | 9028606 | -8.54
524288| 189710172 | 174903867 | -7.80
MostlySorted
nmemb | base | patched | diff
32| 2318 | 2346 | 1.21
4096| 805051 | 759158 | -5.70
32768| 8346363 | 7810444 | -6.42
524288| 143597264 | 135900146 | -5.36
Unsorted
nmemb | base | patched | diff
32| 2364 | 2301 | -2.66
4096| 1076998 | 1014018 | -5.85
32768| 10442153 | 9888078 | -5.31
524288| 206235337 | 192479957 | -6.67
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 1214 | 1184 | -2.47
4096| 332449 | 325865 | -1.98
32768| 3313274 | 3331750 | 0.56
524288| 70786673 | 69067176 | -2.43
Repeated
nmemb | base | patched | diff
32| 4913 | 4813 | -2.04
4096| 1693735 | 1624137 | -4.11
32768| 17054760 | 15896739 | -6.79
524288| 332149265 | 316328778 | -4.76
MostlySorted
nmemb | base | patched | diff
32| 5490 | 5332 | -2.88
4096| 1394312 | 1312703 | -5.85
32768| 12743599 | 12360726 | -3.00
524288| 240249011 | 231603294 | -3.60
Unsorted
nmemb | base | patched | diff
32| 6251 | 6047 | -3.26
4096| 1959306 | 1695241 | -13.48
32768| 17204840 | 16430388 | -4.50
524288| 342716199 | 329496913 | -3.86
Checked on x86_64-linux-gnu.
* stdlib/qsort.c: Move common code to stdlib/qsort_common.c
and parametrize the function definition based wether to use
the '_r' variant.
* stdlib/qsort_common.c: New file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5c21b8d92bb8011a02901d61e6e16d7dd362b3ee
commit 5c21b8d92bb8011a02901d61e6e16d7dd362b3ee
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 11:19:15 2018 -0200
stdlib: Optimization qsort{_r} swap implementation
This patchs adds a optimized swap operation on qsort based in previous
msort one. Instead of byte operation, three variants are provided:
1. Using uint32_t loads and stores.
2. Using uint64_t loads and stores.
3. Generic one with a temporary buffer and memcpy/mempcpy.
The 1. and 2. option are selected only either if architecture defines
_STRING_ARCH_unaligned or if base pointer is aligned to required type.
This is due based on data for bench-qsort, usually programs calls
qsort with array with multiple of machine word as element size.
Benchmarking shows an increase performance:
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1401 | 1958 | 39.76
4096| 351333 | 368533 | 4.90
32768| 3369386 | 3131712 | -7.05
524288| 63192972 | 59807494 | -5.36
MostlySorted
nmemb | base | patched | diff
32| 2391 | 2061 | -13.80
4096| 1124074 | 961816 | -14.43
32768| 11196607 | 9410438 | -15.95
524288| 215908169 | 185586732 | -14.04
Unsorted
nmemb | base | patched | diff
32| 4993 | 2021 | -59.52
4096| 1113860 | 963126 | -13.53
32768| 11251293 | 9518795 | -15.40
524288| 217252237 | 185072278 | -14.81
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1296 | 1267 | -2.24
4096| 359418 | 334852 | -6.83
32768| 3535229 | 3345157 | -5.38
524288| 69847251 | 67029358 | -4.03
MostlySorted
nmemb | base | patched | diff
32| 2745 | 2340 | -14.75
4096| 1222082 | 1014314 | -17.00
32768| 12244800 | 9924706 | -18.95
524288| 241557971 | 196898760 | -18.49
Unsorted
nmemb | base | patched | diff
32| 2972 | 2389 | -19.62
4096| 1314861 | 1024052 | -22.12
32768| 12397909 | 10120848 | -18.37
524288| 241789262 | 193414824 | -20.01
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 1305 | 1287 | -1.38
4096| 346332 | 347979 | 0.48
32768| 3458244 | 3408058 | -1.45
524288| 72793445 | 69973719 | -3.87
MostlySorted
nmemb | base | patched | diff
32| 5435 | 4890 | -10.03
4096| 2032260 | 1688556 | -16.91
32768| 19909035 | 16419992 | -17.52
524288| 390339319 | 325921585 | -16.50
Unsorted
nmemb | base | patched | diff
32| 5833 | 5351 | -8.26
4096| 2022531 | 1724961 | -14.71
32768| 19842888 | 16588545 | -16.40
524288| 388838382 | 324102703 | -16.65
Checked on x86_64-linux-gnu.
[BZ #19305].
* stdlib/qsort.c (SWAP): Remove.
(check_alignment, swap_u32, swap_u64, swap_generic,
select_swap_func): New functions.
(__qsort_r):
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e183819e852e55c21d278c459ff2a3f49c4f8ed0
commit e183819e852e55c21d278c459ff2a3f49c4f8ed0
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Tue Jan 16 10:49:43 2018 -0200
stdlib: Remove use of mergesort on qsort
This patch removes the mergesort optimization on qsort{_r} implementation
and use the quicksort instead. The mergesort implementation has some
issues:
- It is not as-safe since it uses malloc.
- The malloc call adds arbitrary latency (might even worse if malloc
is interposed).
- The swap check rely on system information that might be virtualized
(for instance vms with overcommit memory) which leads to potentially
use of swap (if system advertise more memory than actually has).
The real memory also have the downside of issuing syscalls where
none is really expected (although only once per execution).
- The mergesort is suboptimal on already sorted array (BZ#21719).
The quicksort implementation is already optimized to use O(1) space, so
it should not show issues related to malloc call or stack overflow.
Using bench-qsort (i7-4790K, gcc 7.2.1) shows the performance difference
between mergesort (base) and quicksort (patched):
Results for member size 4
Sorted
nmemb | base | patched | diff
32| 1447 | 1401 | -3.18
4096| 315978 | 351333 | 11.19
32768| 2559093 | 3369386 | 31.66
524288| 46228488 | 63192972 | 36.70
MostlySorted
nmemb | base | patched | diff
32| 1974 | 2391 | 21.12
4096| 922332 | 1124074 | 21.87
32768| 9268671 | 11196607 | 20.80
524288| 186856297 | 215908169 | 15.55
Unsorted
nmemb | base | patched | diff
32| 1978 | 4993 | 152.43
4096| 916413 | 1113860 | 21.55
32768| 9270003 | 11251293 | 21.37
524288| 187606088 | 217252237 | 15.80
Results for member size 8
Sorted
nmemb | base | patched | diff
32| 1424 | 1296 | -8.99
4096| 299105 | 359418 | 20.16
32768| 2737859 | 3535229 | 29.12
524288| 53082807 | 69847251 | 31.58
MostlySorted
nmemb | base | patched | diff
32| 2129 | 2745 | 28.93
4096| 969465 | 1222082 | 26.06
32768| 9605227 | 12244800 | 27.48
524288| 193353927 | 241557971 | 24.93
Unsorted
nmemb | base | patched | diff
32| 2194 | 2972 | 35.46
4096| 958610 | 1314861 | 37.16
32768| 9664246 | 12397909 | 28.29
524288| 193758429 | 241789262 | 24.79
Results for member size 32
Sorted
nmemb | base | patched | diff
32| 4477 | 1305 | -70.85
4096| 1109492 | 346332 | -68.78
32768| 11075976 | 3458244 | -68.78
524288| 230773658 | 72793445 | -68.46
MostlySorted
nmemb | base | patched | diff
32| 5905 | 5435 | -7.96
4096| 2568895 | 2032260 | -20.89
32768| 24936755 | 19909035 | -20.16
524288| 526442900 | 390339319 | -25.85
Unsorted
nmemb | base | patched | diff
32| 6004 | 5833 | -2.85
4096| 2437943 | 2022531 | -17.04
32768| 24789971 | 19842888 | -19.96
524288| 525898556 | 388838382 | -26.06
An increase in latency, however some performance difference is due the fact
mergesort uses a slight improved swap operation than quicksort (which
following
patch addresses). This change also renders the BZ #21719 fix unrequired
(since
it is to fix the sorted input performance degradation for mergesort).
Checked on x86_64-linux-gnu.
[BZ #21719]
* stdlib/Makefile (routines): Remove msort.
(CFLAGS-msort.c): Remove rule.
* stdlib/msort.c: Remove file.
* stdlib/qsort.c (_quicksort): Rename to __qsort_r and add weak_alias
to qsort_r.
(qsort): New symbol.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=a707334fdfcac4ce7b479be6ac8727c45893bdab
commit a707334fdfcac4ce7b479be6ac8727c45893bdab
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Mon Jan 15 09:20:30 2018 -0200
stdlib: Add more qsort{_r} coverage
This patch adds a qsort and qsort_t (which glibc current lacks
coverage). The test check with random input (created using support
random) with different internal types (uint8_t, uint16_t, uint32_t,
and uint64_t) and with different set of element numbers (from 0
to 262144).
Checked on x86_64-linux-gnu.
* stdlib/tst-qsort3.c: New file.
* stdlib/Makefile (tests): Add tst-qsort3.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=99ce2951fdd6a711334dc927f1ecfd9193fdd506
commit 99ce2951fdd6a711334dc927f1ecfd9193fdd506
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Thu Aug 31 20:24:03 2017 -0300
benchtests: Add bench-qsort
This patch adds a qsort benchmark. I tried to model the benchmark taking
in
consideration the possible input variation in both internal element size,
element numbers, and internal state for 1. real word cases and 2. possible
scenarios based on hardware characteristics.
For 1. I tracked qsort usage (using a simple preload library to dump its
usage
and a script to pos-process it) on both GCC bootstrap and Firefox. GCC 8
bootstrap build shows 51786641 call to qsort with the following
characterics:
Key: number of elements:
key=2 : 39.74
key=3 : 19.23
key=4 : 9.77
key=1 : 8.44
key=0 : 6.60
key=5 : 4.40
key=7 : 2.37
key=6 : 2.25
key=9 : 1.48
key=8 : 0.97
Key: element size in bytes:
key=8 : 91.74
key=32 : 3.62
key=4 : 2.42
key=40 : 1.20
key=16 : 0.67
key=24 : 0.30
key=48 : 0.05
key=56 : 0.00
key=1 : 0.00
Key: total size (number of elements * element size):
key=16 : 35.98
key=24 : 18.67
key=32 : 9.79
key=8 : 8.28
key=0 : 6.60
key=40 : 4.21
key=64 : 3.15
key=48 : 2.24
key=56 : 2.15
key=80 : 1.45
So for GCC:
- 80% of total qsort usage are done with 10 elements of less.
- All usages are done element size of maximum of 56 bytes.
- 90% of calls are done with array of maximum size of 80 bytes or less.
The Firefox usage was done with 2 hours of usage, with first 10 minutes
activelly
openning and closing different types of sites. It resulted in 21042 calls
with
following characteristics:
Key: number of elements:
key=7 : 24.40
key=1 : 10.44
key=3 : 6.33
key=4 : 5.81
key=2 : 5.46
key=6 : 4.80
key=17 : 4.54
key=0 : 3.07
key=5 : 3.05
key=9 : 2.51
key=12 : 2.06
Key: element size in bytes:
key=8 : 94.49
key=28 : 4.40
key=2 : 0.70
key=16 : 0.19
key=36 : 0.07
key=12 : 0.07
key=40 : 0.07
key=24 : 0.03
Key: total size (number of elements * element size):
key=56 : 24.20
key=8 : 10.27
key=24 : 6.36
key=32 : 5.86
key=16 : 5.46
key=48 : 4.80
key=476 : 3.75
key=0 : 3.07
key=40 : 3.05
key=72 : 2.50
So for Firefox:
- 72% of total qsort usage are done with 18 elements of less.
- All usages are done element size of maximum of 40 bytes.
- 70% of calls are done with array of maximum size of 476 bytes or less.
For 2. I used the idea of a machine with 3 levels of cache with sizes
32kb (L1), 256kb (L2), and 4096 (L3).
It resulted in a benchmark with following traits:
* It checks four types of input arrays: sorted, mostly sorted, unsorted,
and
repeated. For 'sorted' the array is already sorted, 'mostly sorted'
the
array will have a certain number of random elements with random values
(current ratio used is 20%), for 'unsorted' the array will contain
random
elements from full range based on used type, and for 'repeated' the
array
will have random elements with a certain number (current ratio is 20%)
of
a repeated element distributed randomly.
* Three elements sizes are checked: uint32_t, uint64_t, and an element
with
32 bytes (but using the uint64_t comparison checks). These element
sizes
are used to 1. to avoid include the comparison function itself and/or
memory copy in sort benchmark itself, and 2. because key of size_t are
the
most used for both GCC and Firefox.
* Five different element numbers: 64 (which cover mostly of used element
sizes for both GCC and Firefox), 4096/8192 (which cover 32 KB of L1 for
32 and 64 bits), 32768/65536 (L2 with 256 KB), and 24288/1048576 (L3
with
4 MB). The sizes are configurable by --nelem option.
Checked on x86_64-linux-gnu
* benchtests/Makefile (stdlib-benchset): Add qsort.
* benchtests/bench-qsort.c: New file.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=84e29aafc42d4c01aaf90f76762e6dc94579b9b0
commit 84e29aafc42d4c01aaf90f76762e6dc94579b9b0
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Mon Dec 25 10:53:13 2017 -0200
support: Add Mersenne Twister pseudo-random number generator
This patch adds support routines for pseudo-random number generation
based on Mersenne Twister. The libstc++ version is used as based and
both 32 and 64 bits are provided. It is used on following qsort tests
and benchmarks.
I decided to use a Mersenne Twister (MT) instead of random_r internal
implementation, which uses a linear feedback shift register approach
with trinomials, because it:
- is used extensivelly in other implementations (like c+11);
- has a quite larger period (2^219937-1) than the type 4 variantion
of random (2^63 - 1);
- does not the RAND_MAX limitation.
Checked on x86_64-linux-gnu.
* support/Makefile (libsupport-routines): Add support_random.
(tests): Add tst-support_random.
* support/support_random.c: New file.
* support/support_random.h: Likewise.
* support/tst-support_random.c: Likewise.
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d23a77dae549e95722f62d87d30787387fabff08
commit d23a77dae549e95722f62d87d30787387fabff08
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date: Fri Dec 15 09:49:37 2017 -0200
stdlib: Adjust tst-qsort{2} to libsupport
* stdlib/tst-qsort.c: Use libsupport.
* stdlib/tst-qsort2.c: Likewise.
-----------------------------------------------------------------------
--
You are receiving this mail because:
You are on the CC list for the bug.