This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
To: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>, libc-alpha at sourceware dot org
Cc: nd at arm dot com
Date: Fri, 2 Feb 2018 14:40:30 +0000
Subject: Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
Nodisclaimer: True
References: <1517262265-79445-1-git-send-email-patrick.mcgehearty@oracle.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 29/01/18 21:44, Patrick McGehearty wrote:

New with this version:
Adds updates sparc and x86_64 libm-test-ulps files (1 ulp for
various exp tests). Rewrite of full comment to reflect current
state of patch.

Summary of patch rationale

These changes will be active for all platforms that don't provide
their own exp() routines. They will also be active for ieee754
versions of ccos, ccosh, cosh, csin, csinh, sinh, exp10, gamma, and
erf.

Typical performance gains are 2x on Sparc s7 and 5x on x86_64.
The former code included a slow path to assure no 1 ulp errors
that could be 50-200 times slower than the normal path.
Informal testing suggests perhaps 1 in 200 values might invoke
the slow path.

Using the glibc_perf tests:
       sparc (nsec)    x86 (nsec)
       old     new     old     new
max   18180   936    4863     275
min     399    96      15      15
mean   5499   419    1336      24


i tested this patch on aarch64 against the current code
with the slow path removed and the later was about 10%
faster on both my throughput and latency benchmarks.
(i also removed the rounding mode settings in both cases
as that can be avoided at least on aarch64)

so i suggest just removing the slow path first, which
should have good enough error rate and similar performance.

i did some testing and i think it's possible to do the
common case >30% faster with similar table size and around
0.501 ulp error, with a slower path for values close to
overflow/underflow (at least on aarch64, which has
convert-to-nearest-int instruction that does not depend on
rounding mode, i'll see if it can be done in a generic way)

Follow-Ups:
- Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers
- Re: [PATCH] v11 Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Patrick McGehearty

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]