This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Optimized generic expf and exp2f

From: Arjan van de Ven <arjan at linux dot intel dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, Joseph Myers <joseph at codesourcery dot com>
Cc: nd <nd at arm dot com>
Date: Wed, 6 Sep 2017 06:41:18 -0700
Subject: Re: [PATCH] Optimized generic expf and exp2f
Authentication-results: sourceware.org; auth=none
References: <DB6PR0801MB2053A12C3A4F7032C0107E1683960@DB6PR0801MB2053.eurprd08.prod.outlook.com> <DB6PR0801MB205311F53E0828F3F0B583F383970@DB6PR0801MB2053.eurprd08.prod.outlook.com> <1464883d-e2a0-082b-b844-3cd1a4a91820@linux.intel.com> <DB6PR0801MB20532AB59A6D7C8823ABDB6F83970@DB6PR0801MB2053.eurprd08.prod.outlook.com> <18cffe92-265d-4c43-99d4-dad579e0920e@linux.intel.com> <DB6PR0801MB20534AFC66B413361C102FDE83970@DB6PR0801MB2053.eurprd08.prod.outlook.com>

On 9/6/2017 6:16 AM, Wilco Dijkstra wrote:

Arjan van de Ven wrote:


I'm seeing a 16% throughput increase (not 1.5x) but still impressive.


Was that using the expf trace input or something else? And with wrapper?

I do see different numerical answers between the two (I had to disable
the code in my bench that detects differences) and sampling a few
it seems that the C code is a little bit less accurate in places,
likely a simpler polynomal.
(for example for  20.636783599853515625    as input)


It's still way more accurate than necessary. The only reason is to
minimize ULP error for non-nearest rounding modes. If you don't
care about worst-case ULP for non-standard rounding modes, the
polynomial can be further simplified within 1ULP max error in round
to nearest.


interesting; it takes 2 independent FP adds and a compare (in C) to detect nearest rounding
being in effect (which in time can overlap with the float->double conversion)
so if there's an option to reduce the algorithm by more than that for a fast
path...

(also, some CPUs (like newer Intel) support an instruction prefix encoding to force
rounding modes on a FP instruction independent of the global rounding mode,
which at some point maybe should be a gcc pragma or attribute or something,
and then used in such C code)

Follow-Ups:
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Szabolcs Nagy
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Joseph Myers

References:
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Arjan van de Ven
- Re: [PATCH] Optimized generic expf and exp2f
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]