This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
On 23/01/2018 04:04, Paul Eggert wrote:
> Adhemerval Zanella wrote:
>> At the cost of large text sizes and slight more code:
>
> Yes, that's a common tradeoff for this sort of optimization. My guess is that most glibc users these days would like to spend 4 kB of text space to gain a 2%-or-so CPU speedup. (But it's just a guess. :-)
>> I still prefer my version where generates shorter text segment and also
>> optimizes for uint32_t.
>
> The more-inlined version could also optimize for uint32_t. Such an optimization should not change the machine code on platforms with 32-bit pointers (since uint32_t has the same size and alignment restrictions as void *, and GCC should be smart enough to figure this out) but should speed up the size-4 case on platforms with 64-bit pointers.
>
> Any thoughts on why the more-inlined version is a bit slower when input is already sorted?
Again do we really to over-engineering it? GCC profile usage shows 95% to total
issues done with up to 9 elements and 92% of key size 8. Firefox is somewhat
more diverse with 72% up to 17 elements and 95% of key size 8. I think that
adding even more code complexity by parametrizing the qsort calls to inline
the swap operations won't really make much difference in the aforementioned
user cases.
I would rather add specialized sort implementation such as BSD family, heapsort
and mergesort, to provide different algorithm for different constraints (mergesort
for stable-sort, heapsort/mergesort to avoid worse-case from quicksort). We might
even extend it to add something like introsort.