This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Sparc exp(), expf() performance improvement
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "adhemerval dot zanella at linaro dot org" <adhemerval dot zanella at linaro dot org>
- Cc: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Fri, 4 Aug 2017 18:05:02 +0000
- Subject: Re: [PATCH] Sparc exp(), expf() performance improvement
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Wilco dot Dijkstra at arm dot com;
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
> I agree with David, we should refrain of adding even more platform
> specific assembly optimization where a default C code could be as
> good as and also improve generic performance on other platforms as
Absolutely, the code is already generic and shows great improvements on
other targets (I tried Patrick's expf and it works fine on AArch64, achieving
almost the performance of Szabolc's version).
> The problem you specific is very similar to the one on POWER before POWER8,
> where floating pointer to integer transfer issues a load-hit-store that
> increases latency. I tried to mitigate this on sin/cos by tweaking the
> internal code using a hackish hooks (commit 77a2a8b4a19f0), but currently
> I am convinced that a new algorithm for single float exp, sin, cos (and
> probably others) is in fact a better solution.
We certainly need new algorithms and better implementations of existing math
functions. However in most cases you can use the same generic code and build
it using the right options for the fp->int transfer instructions. I don't see a reason
for target specific implementations that are actually generic. Most target specific
features can be done via macros/inline functions in math_private.h.
Looking at your commit, it seems to me that it is all generic and in most cases
the generic code could be updated to use floating point comparisons. Then if
we can show significant gains using bit manipulation the code could add
specialized paths for those cases that benefit.