This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: 'GNU C Library' <libc-alpha at sourceware dot org>, "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: nd <nd at arm dot com>
- Date: Fri, 14 Dec 2018 19:39:47 +0000
- Subject: Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h
Hi,
I finally had a chance to benchmark this on a recent core using the traces posted
here: https://www.sourceware.org/ml/libc-alpha/2018-12/msg00492.html
I used the same trace for sincosf but this requires a small wrapper to accurately
measure latency and throughput - I'll post the patch for this next week.
On Sandy Bridge I get 10% gain in throughput with the vector FMA version of sincosf
(at the cost of ~2% extra latency), but unfortunately it's 5% slower on AArch64
and latency increases by 10% due to GCC using some unnecessary DUPs and
lane inserts. Sinf and cosf are also affected by the data layout change, throughput
and latency are 1.7% and 1.0% worse respectively.
So it seems the best approach would be to make the vector version conditional on
a macro defined by the target so each target can choose the fastest variant.
Wilco