Bug 20142 - [x86_64] Add SSE4.1 trunc, truncf
Summary: [x86_64] Add SSE4.1 trunc, truncf
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: math (show other bugs)
Version: 2.23
: P2 enhancement
Target Milestone: 2.27
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-05-24 21:17 UTC by Joseph Myers
Modified: 2017-09-20 16:55 UTC (History)
0 users

See Also:
Host: x86_64-*-*
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Joseph Myers 2016-05-24 21:17:00 UTC
sysdeps/x86_64/fpu/multiarch has SSE4.1 versions of ceil, floor, rint and nearbyint functions, using the roundss and roundsd instructions.  Subject of course to the usual benchmarking for any performance patch (including adding benchmarks for these functions to the benchtests included with glibc), there should be such versions of trunc and truncf, using immediate operand 11 to those instructions to get the desired rounding direction.
Comment 1 Sourceware Commits 2017-09-20 16:54:56 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5 (commit)
      from  a856d4d4a8a56eaefdddb58884bfa2bfe922ee4c (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5

commit ae8372d7e4c44f6839aa3d851d4d0cb486b81cd5
Author: Joseph Myers <joseph@codesourcery.com>
Date:   Wed Sep 20 16:54:05 2017 +0000

    Add SSE4.1 trunc, truncf (bug 20142).
    
    This patch adds SSE4.1 versions of trunc and truncf, using the roundsd
    / roundss instructions, similar to the versions of ceil, floor, rint
    and nearbyint functions we already have.  In my testing with the glibc
    benchtests these are about 30% faster than the C versions for double,
    20% faster for float.
    
    Tested for x86_64.
    
    	[BZ #20142]
    	* sysdeps/x86_64/fpu/multiarch/Makefile (libm-sysdep_routines):
    	Add s_trunc-c, s_truncf-c, s_trunc-sse4_1 and s_truncf-sse4_1.
    	* sysdeps/x86_64/fpu/multiarch/s_trunc-c.c: New file.
    	* sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_trunc.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_truncf-c.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/s_truncf.c: Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                      |   12 ++++++++++
 NEWS                                           |    2 +
 sysdeps/x86_64/fpu/multiarch/Makefile          |    6 +++-
 sysdeps/x86_64/fpu/multiarch/s_trunc-c.c       |    2 +
 sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S  |   25 ++++++++++++++++++++
 sysdeps/x86_64/fpu/multiarch/s_trunc.c         |   29 ++++++++++++++++++++++++
 sysdeps/x86_64/fpu/multiarch/s_truncf-c.c      |    2 +
 sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S |   25 ++++++++++++++++++++
 sysdeps/x86_64/fpu/multiarch/s_truncf.c        |   29 ++++++++++++++++++++++++
 9 files changed, 130 insertions(+), 2 deletions(-)
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc-c.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc-sse4_1.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_trunc.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf-c.c
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf-sse4_1.S
 create mode 100644 sysdeps/x86_64/fpu/multiarch/s_truncf.c
Comment 2 Joseph Myers 2017-09-20 16:55:44 UTC
Fixed for 2.27.