This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Optimized generic expf and exp2f
Arjan van de Ven wrote:
>On 9/6/2017 7:14 AM, Szabolcs Nagy wrote:
>>> interesting; it takes 2 independent FP adds and a compare (in C) to detect nearest rounding
>>> being in effect (which in time can overlap with the float->double conversion)
>>> so if there's an option to reduce the algorithm by more than that for a fast
>>> path...
>>>
>>> (also, some CPUs (like newer Intel) support an instruction prefix encoding to force
>>> rounding modes on a FP instruction independent of the global rounding mode,
>>> which at some point maybe should be a gcc pragma or attribute or something,
>>> and then used in such C code)
>>
>
>> i don't think reducing the polynomial (from order 3 to order 2)
>> is possible without bigger lookup table, if less accuracy is
>> enough then reducing the table size is possible though:
>>
>> poly order / table len / ulp error / non-nearest ulp error (rounded)
>> 2 / 64 / 0.61 /
>> 2 / 128 / 0.51 /
>> 2 / 256 / 0.502 /
>> 3 / 8 / 0.91 / > 10
>> 3 / 16 / 0.526 / 2
>> 3 / 32 / 0.502 / 1
>> 3 / 64 / 0.5001 / 1
>> 4 / 8 / 0.54 /
>> 4 / 16 / 0.501 /
>> 4 / 32 / 0.50004 /
>> 4 / 64 / 0.5 /
>>
>> the c code uses order=3/table=32, the x86_64 asm uses order=4/table=64
>>
>
> yeah I don't think it'll work out in terms of saving cycles; on Intel at least
> FMA is 4 cycles, but an ADD is 4 cycles as well, so there's no net savings
> by doing the 2xADD+compare to save an FMA.
> (since the ADDs execute in parallel it's also not likely to be more expensive)
Most of a rounding mode test is already there given expf does range reduction.
So you just need to test whether the remainder is outside the [-C,C] interval and
then adjust as necessary.
Note adding a compare does not increase latency as it is all off the critical path.
So I believe further latency reduction is feasible while keeping throughput similar.
It all depends on how much people care about getting near perfect results for
non-nearest rounding modes...
Wilco