sysdeps/x86_64/fpu/multiarch has SSE4.1 versions of ceil, floor, rint and nearbyint functions, using the roundss and roundsd instructions. Subject of course to the usual benchmarking for any performance patch (including adding benchmarks for these functions to the benchtests included with glibc), there should be such versions of trunc and truncf, using immediate operand 11 to those instructions to get the desired rounding direction.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 (commit) from a856d4d4a8a56eaefdddb58884bfa2bfe922ee4c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 commit ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 Author: Joseph Myers <joseph@codesourcery.com> Date: Wed Sep 20 16:54:05 2017 +0000 Add SSE4.1 trunc, truncf (bug 20142). This patch adds SSE4.1 versions of trunc and truncf, using the roundsd / roundss instructions, similar to the versions of ceil, floor, rint and nearbyint functions we already have. In my testing with the glibc benchtests these are about 30% faster than the C versions for double, 20% faster for float. Tested for x86_64. [BZ #20142] * sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines): Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1. * sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file. * sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise. * sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 12 ++++++++++ NEWS | 2 + sysdeps/x86_64/fpu/multiarch/Makefile | 6 +++- sysdeps/x86_64/fpu/multiarch/s_trunc-c.c | 2 + sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S | 25 ++++++++++++++++++++ sysdeps/x86_64/fpu/multiarch/s_trunc.c | 29 ++++++++++++++++++++++++ sysdeps/x86_64/fpu/multiarch/s_truncf-c.c | 2 + sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S | 25 ++++++++++++++++++++ sysdeps/x86_64/fpu/multiarch/s_truncf.c | 29 ++++++++++++++++++++++++ 9 files changed, 130 insertions(+), 2 deletions(-) create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc-c.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf-c.c create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf.c
Fixed for 2.27.