This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimized generic expf and exp2f

From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
To: Arjan van de Ven <arjan at linux dot intel dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Joseph Myers <joseph at codesourcery dot com>
Cc: nd <nd at arm dot com>
Date: Wed, 6 Sep 2017 15:04:51 +0000
Subject: Re: [PATCH] Optimized generic expf and exp2f
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
Nodisclaimer: True
References: <DB6PR0801MB2053A12C3A4F7032C0107E1683960@DB6PR0801MB2053.eurprd08.prod.outlook.com> <DB6PR0801MB205311F53E0828F3F0B583F383970@DB6PR0801MB2053.eurprd08.prod.outlook.com> <1464883d-e2a0-082b-b844-3cd1a4a91820@linux.intel.com> <DB6PR0801MB20532AB59A6D7C8823ABDB6F83970@DB6PR0801MB2053.eurprd08.prod.outlook.com> <18cffe92-265d-4c43-99d4-dad579e0920e@linux.intel.com> <DB6PR0801MB20534AFC66B413361C102FDE83970@DB6PR0801MB2053.eurprd08.prod.outlook.com> <2a35a5be-e934-12c9-fbfd-8f38b0f57207@linux.intel.com> <59B002E2.9020505@arm.com>,<2b06a6df-40c1-e0be-101a-0f178b086ea7@linux.intel.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

Arjan van de Ven wrote:
    
>On 9/6/2017 7:14 AM, Szabolcs Nagy wrote:
>>> interesting; it takes 2 independent FP adds and a compare (in C) to detect nearest rounding
>>> being in effect (which in time can overlap with the float->double conversion)
>>> so if there's an option to reduce the algorithm by more than that for a fast
>>> path...
>>>
>>> (also, some CPUs (like newer Intel) support an instruction prefix encoding to force
>>> rounding modes on a FP instruction independent of the global rounding mode,
>>> which at some point maybe should be a gcc pragma or attribute or something,
>>> and then used in such C code)
>>
> 
>> i don't think reducing the polynomial (from order 3 to order 2)
>> is possible without bigger lookup table, if less accuracy is
>> enough then reducing the table size is possible though:
>> 
>> poly order / table len / ulp error / non-nearest ulp error (rounded)
>> 2          / 64        / 0.61      /
>> 2          / 128       / 0.51      /
>> 2          / 256       / 0.502     /
>> 3          / 8         / 0.91      / > 10
>> 3          / 16        / 0.526     / 2
>> 3          / 32        / 0.502     / 1
>> 3          / 64        / 0.5001    / 1
>> 4          / 8         / 0.54      /
>> 4          / 16        / 0.501     /
>> 4          / 32        / 0.50004   /
>> 4          / 64        / 0.5       /
>> 
>> the c code uses order=3/table=32, the x86_64 asm uses order=4/table=64
>> 
>
> yeah I don't think it'll work out in terms of saving cycles; on Intel at least
> FMA is 4 cycles, but an ADD is 4 cycles as well, so there's no net savings
> by doing the 2xADD+compare to save an FMA.
> (since the ADDs execute in parallel it's also not likely to be more expensive)

Most of a rounding mode test is already there given expf does range reduction.
So you just need to test whether the remainder is outside the [-C,C] interval and
then adjust as necessary.

Note adding a compare does not increase latency as it is all off the critical path.
So I believe further latency reduction is feasible while keeping throughput similar.

It all depends on how much people care about getting near perfect results for
non-nearest rounding modes...

Wilco

Follow-Ups:
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven

References:
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Szabolcs Nagy
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]