This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] BZ #14649: Add multiarch FMA support to x86-64 libm
On Tue, Oct 2, 2012 at 8:25 AM, Andreas Jaeger <aj@suse.com> wrote:
>>>>
>>>> Since functions in libm are implemented by calling each other,
>>>> all functions called from a libm function compiled for FMA must
>>>> also be compiled by FMA with _fma as the suffix in their symbol
>>>> names. Otherwise, wrong functions may be called. One way
>>>
>>>
>>>
>>> Really?
>>>
>>> If func a calls b, then a can be fma optimized but b does not need to be.
>>> Why does a_fma need to call b_fma instead of b?
>>>
>>
>> Take e_pow for example, when we optimize it for FMA, we must also optimize
>> __slowpow for FMA since it calls __slowpow. Although __slowpow itself
>> doesn't use any FMA instructions, it calls other functions which use FMA:
>>
>> [hjl@gnu-tools-1 math]$ nm slowpow-fma4.o
>> U __add_fma4
>> U __dbl_mp_fma4
>> 0000000000000000 r eps.3048
>> U __halfulp_fma4
>> 0000000000000000 r .LC0
>> U __mp_dbl
>> U __mpexp_fma4
>> U __mplog_fma4
>> U __mul_fma4
>> 0000000000000000 T __slowpow_fma4
>> U __sub_fma4
>> [hjl@gnu-tools-1 math]$
>>
>> So even if __slowpow doesn't use FMA, we must compile __slowpow
>> with FMA so that it can calls other functions with FMA. One way to
>> fix it is to make all those internal functions IFUC. Their references
>> will
>> be resolved to the proper versions at run-time. Instead of calling
>> __slowpow_fma4, we just call __slowpow, which is an IFUNC function
>> optimized for SSE2 and AVX. Other internal functions can be
>> optimized for SSE2, AVX, FMA and FMA4.
>
>
> I see the advantage of doing so if it brings us speed benefits - but not the
> necessity. In other words: This is for me an optimization issue not one of
> correctness.
IFUNC is designed for speed, not for correctness.
> slowpow could call a non-fma (generic) __mpexp function instead of an
> optimized one.
>
There should be no __slowpow_fma4, just __slowpow.
--
H.J.