This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Use floor functions not __floor functions in glibc libm


Hi Joseph,

> Similar to the changes that were made to call sqrt functions directly
> in glibc, instead of __ieee754_sqrt variants, so that the compiler
> could inline them automatically without needing special inline
> definitions in lots of math_private.h headers, this patch makes libm
> code call floor functions directly instead of __floor variants,
> removing the inlines / macros for x86_64 (SSE4.1) and powerpc
> (POWER5).

Looks great, thanks for doing this! The more general mechanism means
it should be much easier to do this for the remaining functions. Yes it
sounds like a good idea to do this for copysign too.

> Note that it's possible that in some cases an inline may be used where
> an IFUNC call was previously used - this is the case on x86_64, for
> example.  I think the direct calls to floor are still appropriate; if
> there's any significant performance cost from inline SSE2 floor
> instead of an IFUNC call ending up with SSE4.1 floor, that indicates
> that either the function should be doing something else that's faster
> than using floor at all, or it should itself have IFUNC variants, or
> that the compiler choice of inlining for generic tuning should change
> to allow for the possibility that, by not inlining, an SSE4.1 IFUNC
> might be called at runtime - but not that glibc should avoid calling
> floor internally.  (After all, all the same considerations would apply
> to any user program calling floor, where it might either be inlined or
> left as an out-of-line call allowing for a possible IFUNC.)  Any
> comments on this point?

Going via the PLT is expensive and it would be stupid to not inline simple
functions like floor, lrint etc. I did a quick experiment on floorf: 
On AArch64 a tight loop calling floorf is at least twice as fast than a library
call. On x64 the PLT overhead is at least 2.5 times.

The SSE2 floor instruction is twice as slow as the SSE4 version, however
due to the high PLT call overhead, inlining the SSE2 version is still 25%
faster than calling floorf using the SSE4 instruction. So inlining these
functions is always better.

Wilco

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]