This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] x86-64: Optimize e_expf with FMA [BZ #21912]
On 16/08/2017 11:56, Szabolcs Nagy wrote:
> On 16/08/17 15:31, Arjan van de Ven wrote:
>> On 8/16/2017 7:04 AM, Carlos O'Donell wrote:
>>> On 08/16/2017 09:34 AM, H.J. Lu wrote:
>>>> FMA optimized e_expf improves performance by more than 50% on Skylake.
>>>> Any comments?
>>> Exactly how much of e_expf-fma.S do you need to achieve that 50% speedup?
>> the core "fast path"
>> (the bit after /* Main path: here if 2^(-28)<=|x|<125*log(2) */ )
>>> How does this algorithm compare to what is already implemented for e_expf?
>> I started with the SSE version of that e_expf, turned it into AVX, used FMA where possible and fixed a few
>> glass jaws in the fast path that you hit on skylake.
>> the slow path is more a direct 1:1 translation from SSE to AVX (because mixing SSE and AVX
>> is generally a bad idea)
> based on my benchmarks portable c code can
> easily beat the hand written sse asm
> (i haven't tested with avx+fma though).
> the idea is that the x86 asm has overkill
> precision (very close to 0.5 ulp error, but
> not correctly rounded), we can debate this
> later, but i think the polynomial can be
> reduced and there should not be much difference
> between asm and c performance (only the
> round/convert to int operation is tricky:
> for different targets the optimal code is
> different, but that can be a target specific
> macro hook).
> anyway i posted my code to the arm
> optimized-routines github repo, i'll start
> posting the patches to glibc soon.
> (one of the reasons posting glibc patches is
> difficult is the nonsensical target specific
> asm codes and ifunc resolvers that break when
> i update the generic code in a way that
> bypasses the wrapper function which is another
> source of improvements.)
Yes, the include of generic implementation for ifunc default version could
use some cleanup. However mostly, if not all, can be checked by
build-many-glibc.py (it would take time though).