This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 1/3] Use generic vector computations in s_sincosf.h


Hi,

I finally had a chance to benchmark this on a recent core using the traces posted
here: https://www.sourceware.org/ml/libc-alpha/2018-12/msg00492.html
I used the same trace for sincosf but this requires a small wrapper to accurately
measure latency and throughput - I'll post the patch for this next week.

On Sandy Bridge I get 10% gain in throughput with the vector FMA version of sincosf
(at the cost of ~2% extra latency), but unfortunately it's 5% slower on AArch64
and latency increases by 10% due to GCC using some unnecessary DUPs and
lane inserts. Sinf and cosf are also affected by the data layout change, throughput
and latency are 1.7% and 1.0% worse respectively.

So it seems the best approach would be to make the vector version conditional on
a macro defined by the target so each target can choose the fastest variant.

Wilco


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]