This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Paul Eggert <eggert at cs dot ucla dot edu>, libc-alpha at sourceware dot org
Date: Tue, 23 Jan 2018 16:28:09 -0200
Subject: Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
Authentication-results: sourceware.org; auth=none
References: <1516298002-4618-1-git-send-email-adhemerval.zanella@linaro.org> <1516298002-4618-7-git-send-email-adhemerval.zanella@linaro.org> <f1472674-e8f1-77d6-acd2-069f7fa89ce0@cs.ucla.edu> <3ce984d8-751c-f3e9-7dd7-e13e18f50c0e@linaro.org> <79fcab22-899c-1ef1-d072-0aaae977c470@cs.ucla.edu> <514e9f52-a06b-eaed-5cb8-3bd165e1305f@linaro.org> <3404a331-2832-4bf6-703f-25e3d6255d32@cs.ucla.edu> <da63fac2-4f25-4a04-363a-82ad40223b56@linaro.org> <abcfe032-a71b-53d0-c16b-78f371c626e9@cs.ucla.edu>

On 23/01/2018 04:04, Paul Eggert wrote:
> Adhemerval Zanella wrote:
>> At the cost of large text sizes and slight more code:
> 
> Yes, that's a common tradeoff for this sort of optimization. My guess is that most glibc users these days would like to spend 4 kB of text space to gain a 2%-or-so CPU speedup. (But it's just a guess. :-)
>> I still prefer my version where generates shorter text segment and also
>> optimizes for uint32_t.
> 
> The more-inlined version could also optimize for uint32_t. Such an optimization should not change the machine code on platforms with 32-bit pointers (since uint32_t has the same size and alignment restrictions as void *, and GCC should be smart enough to figure this out) but should speed up the size-4 case on platforms with 64-bit pointers.
> 
> Any thoughts on why the more-inlined version is a bit slower when input is already sorted?

Again do we really to over-engineering it? GCC profile usage shows 95% to total 
issues done with up to 9 elements and 92% of key size 8.  Firefox is somewhat 
more diverse with 72% up to 17 elements and 95% of key size 8.  I think that 
adding even more code complexity by parametrizing the qsort calls to inline 
the swap operations won't really make much difference in the aforementioned
user cases.

I would rather add specialized sort implementation such as BSD family, heapsort
and mergesort, to provide different algorithm for different constraints (mergesort
for stable-sort, heapsort/mergesort to avoid worse-case from quicksort). We might
even extend it to add something like introsort.

Follow-Ups:
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Paul Eggert

References:
- [PATCH 0/7] Refactor qsort implementation
  - From: Adhemerval Zanella
- [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Adhemerval Zanella
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Paul Eggert
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Adhemerval Zanella
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Paul Eggert
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Adhemerval Zanella
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Paul Eggert
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Adhemerval Zanella
- Re: [PATCH 6/7] stdlib: Optimization qsort{_r} swap implementation
  - From: Paul Eggert

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]