This is the mail archive of the
mailing list for the glibc project.
RFC: Creating a more efficient sincos interface
- From: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- To: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, "gcc at gcc dot gnu dot org" <gcc at gcc dot gnu dot org>
- Cc: nd <nd at arm dot com>
- Date: Thu, 13 Sep 2018 13:27:37 +0000
- Subject: RFC: Creating a more efficient sincos interface
The existing sincos functions use 2 pointers to return the sine and cosine result. In
most cases 4 memory accesses are necessary per call. This is inefficient and often
significantly slower than returning values in registers. I ran a few experiments on the
new optimized sincosf implementation in GLIBC using the following interface:
__complex__ float sincosf2 (float);
This has 50% higher throughput and a 25% reduction in latency on Cortex-A72 for
random inputs in the range +-PI/4. Larger inputs take longer and thus have lower
gains, but there is still a 5% gain on the (rarely used) path with full range reduction.
Given sincos is used in various HPC applications this can give a worthwile speedup.
LLVM already supports something similar for OSX using a struct of 2 floats.
Using complex float is better since not all targets may support returning structures in
floating point registers and GCC generates very inefficient code on targets that do
What do people think? Ideally I'd like to support this in a generic way so all targets can
benefit, but it's also feasible to enable it on a per-target basis. Also since not all libraries
will support the new interface, there would have to be a flag or configure option to switch
the new interface off if not supported (maybe automatically based on the math.h header).