This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Old compiler optimizations in installed headers


On Thu, May 28, 2015 at 04:18:35PM +0000, Joseph Myers wrote:
> On Mon, 25 May 2015, OndÅej BÃlka wrote:
> 
> > > I'd like to propose that:
> > > 
> > > (a) if an optimization could clearly be done in the compiler - if it only 
> > > depends on the standard semantics of standard functions, or fully-defined 
> > > semantics of glibc functions that it would be reasonable to encode into 
> > > GCC, rather than e.g. generating calls to a glibc-internal function - then 
> > > we should be wary of adding it to glibc's headers in the first place: in 
> > > accordance with principles of GNU projects working together, it's better 
> > > to add the optimization to GCC; and
> > >
> > I disagree for simple reason of cost. Its considerably easier to write a
> > inline function that does transformation than to write equivalent gcc
> > pass.
> 
> The first question should be to determine the right way to implement 
> something rather than the quick way.

On several simple optimizations both are correct so question is which is
easier to implement. Current gcc shown evidence that compiler
optimization isn't one. You could have performance regression with
almost all functions due to underlying bugs in memcmp and memcpy
generation.

As these are outstanding bugs lets be practical here. Joseph how long do
you think it will take to fix them. As these are present for five years
I wouldn't be surprised to wait another five years.

So I would make deadline of three months, if gcc cannot produce a patch
within that going compiler optimization way does take too long.

>  And I think the right way is 
> generally compiler optimization - which (for example) allows for use in 
> kernel space, for optimization based on the function semantics (e.g. as 

For kernel space you could just surround these with #ifdef _GCC_USE_SSE
or equivalent.

> regards aliasing) rather than just the semantics of a particular 
> implementation in a header, and for different expansions depending on 
> whether the compiler thinks the code in question is hot or cold 
> (information possibly obtained from profile feedback - much code is 
> generally cold, so expansions that increase code size should only be used 
> in those bits of code determined to be hot, which is information simply 
> not available at all in the headers).
> 
You shouldn't use pattern: Something needs to be done. X is something.
So X should be done.

As you mentioned profiling you would need userspace based profiling
instead of generic one by gcc. You could access userspace counters from
header.

As you mentioned hotness/coldness first you need make gcc measure real
thing which are number icache misses instead of trying to guess
hotness/coldness from frequencies. A code in tight loop with high
iteration count is hot no matter how rarely its executed.

Then as you said that expansion that increases code size should only be
used in hot bits of code you are wrong again.

You need to also do profiling of library function and don't do
ransformation only when function is cache resident. If its not then
expansion would improve performance as you need to fetch only several
bytes into icache instead wasting time on fetching whole function to
cache.

Then you have problem that optimizing for size is missnomer. You do
optimization knowing that each byte in code carries some penalty in
cache misses. Then for optimizations you need to check if they are
cost-effective instead broad generalization of hot/cold increases size
or not. 

Then as I mentioned userspace profiling you need to collect correct data
to get optimization. When we keep focus on strcmp/memcmp its question if
inlining first byte check helps. 

#define strcmp(x, y) ({          \
  profile.iterations++;          \
  if (x[0] - y[0])               \
    (x[0] - y[0]);               \
  else                           \
    {                            \
       profile.second_byte++;    \
...

>From my profiling this helps for strcmp and strncmp. However its mistake
for memcmp, which is used to compare structures so mismatch likely
occurs much later. That also means that you discard information my
conversion of strcmp->memcmp unless you do profiling.


> Then, if putting an optimization (or compiler bug workaround, etc.) in 
> glibc's headers when a compiler approach would also be possible, it should 
> always be accompanied by a comment pointing to the GCC bug report 
> requesting the optimization, and the bug should have a comment pointing 
> back to that glibc header comment and saying to inform the glibc 
> developers when resolving the bug so they know to insert appropriate 
> __GNUC_PREREQ conditionals in the header.
> 
Of course that I will describe bugs.

Also you have other problem with gcc/header issue. I was asked if its
possible to use functions with partially expanded header, so you would
call new symbol like memcmp_aligned to save cost of checking header
again. I don't think that adding a expansion in gcc is sane.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]