This is the mail archive of the libc-hacker@sourceware.org mailing list for the glibc project.

Note that libc-hacker is a closed list. You may look at the archives of this list, but subscription and posting are not open.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH] qsort speedups


Hi!

On Sun, May 06, 2007 at 06:56:39AM +0100, belazougui djamel wrote:
> I am writing again about glibc qsort algorithms and its performance. I
> don't know exactly what to do to attract attention on this subject, but
> speed of GNU qsort  could greatly improved. At least in my experiments,
> i can get quite large improvements. One of the reasons that could make
> qsort so slow is the use of memcpy function in the inner loop. this is
> inefficient because memcpy is slow when copied buffers are of variable
> lengths.One of the solutions i have experimented was to use 4 different
> sort functions, each one implemented for a class of operands lengths:
> 1-sort function for operands of size 4.
> 2-1-sort function for operands of size 8.
> 3-sort function for operands of size 12,20,28,,8n+4.
> 4-sort function for operands of size 16,24,32,,8n.
> 
> for each of those functions i wrote my own memory copy function
> optimized for size class and this made the algorithm run faster. Another
> solution that made even larger speedup is to use indirect merge sorting.

Here is a patch which implements that.  For code size reasons it actually
doesn't use different functions, as that made msort.os much bigger,
just a switch.
Below are some benchmark numbers from x86_64-linux, system glibc doesn't
have that patch, while libc build tree in cwd does.

~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 2048
Strip out best and worst realtime result
minimum: 0.286614861 sec real / 0.000026731 sec CPU
maximum: 0.296511410 sec real / 0.000049066 sec CPU
average: 0.292100834 sec real / 0.000042651 sec CPU
stdev  : 0.001431638 sec real / 0.000003603 sec CPU
~/timing stdlib/tst-qsort2 100000 2048
Strip out best and worst realtime result
minimum: 2.032786138 sec real / 0.000042467 sec CPU
maximum: 2.039871554 sec real / 0.000055063 sec CPU
average: 2.037650687 sec real / 0.000046376 sec CPU
stdev  : 0.001668754 sec real / 0.000001625 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 256
Strip out best and worst realtime result
minimum: 0.079751795 sec real / 0.000037594 sec CPU
maximum: 0.086948837 sec real / 0.000047276 sec CPU
average: 0.080501879 sec real / 0.000041131 sec CPU
stdev  : 0.001015890 sec real / 0.000002226 sec CPU
~/timing stdlib/tst-qsort2 100000 256
Strip out best and worst realtime result
minimum: 0.219361708 sec real / 0.000037121 sec CPU
maximum: 0.221073085 sec real / 0.000048177 sec CPU
average: 0.220079997 sec real / 0.000040878 sec CPU
stdev  : 0.000306212 sec real / 0.000002211 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 32
Strip out best and worst realtime result
minimum: 0.032166783 sec real / 0.000036703 sec CPU
maximum: 0.042934657 sec real / 0.000044877 sec CPU
average: 0.040185916 sec real / 0.000041135 sec CPU
stdev  : 0.002841348 sec real / 0.000002156 sec CPU
~/timing stdlib/tst-qsort2 100000 32
Strip out best and worst realtime result
minimum: 0.036521886 sec real / 0.000027210 sec CPU
maximum: 0.049771413 sec real / 0.000050816 sec CPU
average: 0.045404896 sec real / 0.000040574 sec CPU
stdev  : 0.003352199 sec real / 0.000003306 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 16
Strip out best and worst realtime result
minimum: 0.022185376 sec real / 0.000034847 sec CPU
maximum: 0.036350224 sec real / 0.000050040 sec CPU
average: 0.032392322 sec real / 0.000041894 sec CPU
stdev  : 0.003810001 sec real / 0.000002991 sec CPU
~/timing stdlib/tst-qsort2 100000 16
Strip out best and worst realtime result
minimum: 0.024453106 sec real / 0.000035883 sec CPU
maximum: 0.038619840 sec real / 0.000044240 sec CPU
average: 0.035602517 sec real / 0.000040279 sec CPU
stdev  : 0.002733362 sec real / 0.000001737 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 8
Strip out best and worst realtime result
minimum: 0.021042132 sec real / 0.000021910 sec CPU
maximum: 0.024142820 sec real / 0.000038728 sec CPU
average: 0.023244031 sec real / 0.000030432 sec CPU
stdev  : 0.001035261 sec real / 0.000002934 sec CPU
~/timing stdlib/tst-qsort2 100000 8
Strip out best and worst realtime result
minimum: 0.014727083 sec real / 0.000021330 sec CPU
maximum: 0.024788724 sec real / 0.000039112 sec CPU
average: 0.023562170 sec real / 0.000030172 sec CPU
stdev  : 0.001994661 sec real / 0.000003953 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 4
Strip out best and worst realtime result
minimum: 0.020438105 sec real / 0.000018788 sec CPU
maximum: 0.023082930 sec real / 0.000039773 sec CPU
average: 0.022272783 sec real / 0.000027106 sec CPU
stdev  : 0.000949993 sec real / 0.000004309 sec CPU
~/timing stdlib/tst-qsort2 100000 4
Strip out best and worst realtime result
minimum: 0.020050736 sec real / 0.000016737 sec CPU
maximum: 0.033928475 sec real / 0.000034287 sec CPU
average: 0.027609688 sec real / 0.000023064 sec CPU
stdev  : 0.004832571 sec real / 0.000004405 sec CPU
~/timing elf/ld.so --library-path . stdlib/tst-qsort2 100000 2
Strip out best and worst realtime result
minimum: 0.026925996 sec real / 0.000021625 sec CPU
maximum: 0.032363087 sec real / 0.000032854 sec CPU
average: 0.029911853 sec real / 0.000023378 sec CPU
stdev  : 0.002086839 sec real / 0.000001389 sec CPU
~/timing stdlib/tst-qsort2 100000 2
Strip out best and worst realtime result
minimum: 0.026910302 sec real / 0.000020686 sec CPU
maximum: 0.032150177 sec real / 0.000040594 sec CPU
average: 0.029878779 sec real / 0.000023112 sec CPU
stdev  : 0.002115564 sec real / 0.000001469 sec CPU

	Jakub

Attachment: Q2
Description: Text document


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]