This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Propose C Metaprogramming-based qsort as GNU extension

From: Daniel Santos <daniel dot santos at pobox dot com>
To: OndÅej BÃlka <neleai at seznam dot cz>
Cc: libc-alpha at sourceware dot org, carlos at redhat dot com
Date: Wed, 13 May 2015 10:11:31 -0500
Subject: Re: Propose C Metaprogramming-based qsort as GNU extension
Authentication-results: sourceware.org; auth=none
References: <54E25DDB dot 8060900 at pobox dot com> <20150401152657 dot GA31359 at domone>

On 04/01/2015 10:26 AM, OndÅej BÃlka wrote:

On Mon, Feb 16, 2015 at 03:15:07PM -0600, Daniel Santos wrote:

As part of my work on a paper on C Metaprogramming with gcc, I have
implemented a sqort/insertion sort algorithm based off of the legacy
glibc implementation (when msort is not used) who's performance far
exceeds the current implementation. I have opened an enhancement bug
here https://sourceware.org/bugzilla/show_bug.cgi?id=17941 with
details about this, and was told I need to talk to you guys.

In short, a basic benchmark shows an average performance increase
83% (16k of data, using element sizes from 1 to 8192 bytes). These
are techniques that I appear to have invented in 2012; I haven't run
into anybody else doing it just yet, but I hope it will catch on.
Currently I only know this to work on GCC, but it may be possible
that other compilers have enough support (in their extensions) to
make it possible.

The basic theory of operation is that by exploiting
-findirect-inline, and attributes always_inline & flatten (in some
cases), we force the compiler to inline a larger function than would
normally be considered beneficial -- this is where instantiation of
the "pseudo-template function" (as I call it) occurs. In this way a
more efficient sort function is generated, because we know at
compile-time the exact size and alignment of data elements and we
can inline the compar function. We can even decide if we'll be doing
a direct or indirect sort at compile-time, tune stack size, etc. The
code its self needs more cleanup to be ready to integrate into
glibc, although the algorithm appears to now be quite efficient and
error-free.

The main question is really where it belongs. The Boost project was
started as a place for experimental libraries, many of which ended
up in a later C++ standard. As I see it, a similar process must take
place with C metaprograming, as it provides a powerful tool to
improve performance in programs, libraries and system-level code. I
thought that glibc might be a nice place for this because there are
already so many extensions. This is slightly different however,
because we aren't just providing new functions, but new
metafunctions.

Either way, I would very much like this work to fall under the GNU
umbrella if at all possible.

Daniel

Something like this is on my backlog. You can start with patch with
always_inline qsort template in header surrounded by

# if __GNUC_PREREQ (4, 8)

You do several basic optimizations. I wanted to do more, a hard part
would make gcc recognize different comparison functions. In most cases
radixsort is faster than quick/mergesort.

I sincerely apologize for my delayed response, I've had some things comeup that have arrested my attention for the last month or so.

Please elaborate on the difficulty of getting gcc to recognize differentcomparison functions. At current, in none of my tests have I found acomparison function not being fully inlined (via pointer indirection).It is true that __attribute__((always_inline)) isn't really guaranteedto always inline a function via pointer (at current), but if Iunderstand correctly, that is the reason that the flatten attribute wasadded -- exactly to ensure that that is occurs. Obviously, thecomparison function needs to be in the same translation unit or aninefficient function call is made -- and it's not the function call somuch that tends to be costly, but the inability to perform optimizationsacross the function boundary.

Aside from that, I would love to hear some suggestions on otheroptimizations. I think I need to finish up my benchmarking scripts tobe able to more quickly discern performance changes due to differingtechniques. As far as radix sort vs quick/merge sort, your suggestioncauses me to reconsider even the naming of the template function, beingthat there's no naming standard that needs being followed, perhaps"qmsort_template" would be more descriptive and a separate"radixsort_template" could be added.

One final note on the quick/merge sort, one of it's major drawbacks ismemory consumption and one thing I haven't addressed yet is the staticstack size the function allocates. This will eventually be controllablevia the struct qsort_def fields max_count_bits and max_stack_space,which don't do anything yet. Another optimization that isn't employed isthat if a program knows that it will always have less than 2^16 or 2^32data elements (assuming 64 bit pointers on the later), stack space canbe further reduced by using offsets rather than pointers, potentiallydecreasing data cache misses at the expense (perhaps nominal) of usingindexed addressing.


Daniel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]