This is the mail archive of the
glibc-cvs@sourceware.org
mailing list for the glibc project.
GNU C Library master sources branch master updated. glibc-2.21-496-ga6336cc
- From: andros at sourceware dot org
- To: glibc-cvs at sourceware dot org
- Date: 18 Jun 2015 17:39:56 -0000
- Subject: GNU C Library master sources branch master updated. glibc-2.21-496-ga6336cc
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".
The branch, master has been updated
via a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229 (commit)
from c9a8c526acd185176e486bee4624039740f8c435 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://sourceware.org/git/gitweb.cgi?p=glibc.git;a=commitdiff;h=a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229
commit a6336cc446a7ed682cb9dbc47cc56ebf9f9a4229
Author: Andrew Senkevich <andrew.senkevich@intel.com>
Date: Thu Jun 18 20:11:27 2015 +0300
Vector sincosf for x86_64 and tests.
Here is implementation of vectorized sincosf containing SSE, AVX,
AVX2 and AVX512 versions according to Vector ABI
<https://groups.google.com/forum/#!topic/x86-64-abi/LmppCfN1rZ4>.
* NEWS: Mention addition of x86_64 vector sincosf.
* math/test-float-vlen16.h: Added wrapper for sincosf tests.
* math/test-float-vlen4.h: Likewise.
* math/test-float-vlen8.h: Likewise.
* sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
* sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration.
* sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
* sysdeps/x86_64/fpu/Versions: New versions added.
* sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
* sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines):
Added build of SSE, AVX2 and AVX512 IFUNC versions.
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
* sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
* sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
* sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
* sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
* sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
* sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file.
* sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file.
* sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers.
* sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests.
* sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
* sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
diff --git a/ChangeLog b/ChangeLog
index d07096d..8aeb643 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,38 @@
2015-06-18 Andrew Senkevich <andrew.senkevich@intel.com>
+ * NEWS: Mention addition of x86_64 vector sincosf.
+ * math/test-float-vlen16.h: Added wrapper for sincosf tests.
+ * math/test-float-vlen4.h: Likewise.
+ * math/test-float-vlen8.h: Likewise.
+ * sysdeps/unix/sysv/linux/x86_64/libmvec.abilist: New symbols added.
+ * sysdeps/x86/fpu/bits/math-vector.h: Added sincosf SIMD declaration.
+ * sysdeps/x86_64/fpu/Makefile (libmvec-support): Added new files.
+ * sysdeps/x86_64/fpu/Versions: New versions added.
+ * sysdeps/x86_64/fpu/libm-test-ulps: Regenerated.
+ * sysdeps/x86_64/fpu/multiarch/Makefile (libmvec-sysdep_routines):
+ Added build of SSE, AVX2 and AVX512 IFUNC versions.
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
+ * sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
+ * sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
+ * sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
+ * sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
+ * sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
+ * sysdeps/x86_64/fpu/svml_s_sincosf_data.S: New file.
+ * sysdeps/x86_64/fpu/svml_s_sincosf_data.h: New file.
+ * sysdeps/x86_64/fpu/svml_s_wrapper_impl.h: Added 3 argument wrappers.
+ * sysdeps/x86_64/fpu/test-float-vlen16.c: : Vector sincosf tests.
+ * sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen4.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-avx2.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c: Likewise.
+ * sysdeps/x86_64/fpu/test-float-vlen8.c: Likewise.
+
* NEWS: Mention addition of x86_64 vector sincos.
* bits/libm-simd-decl-stubs.h: Added stubs for sincos.
* math/math.h (__MATHDECL_VEC): New macro.
diff --git a/NEWS b/NEWS
index fedbe24..050522f 100644
--- a/NEWS
+++ b/NEWS
@@ -55,8 +55,8 @@ Version 2.22
condition in some applications.
* Added vector math library named libmvec with the following vectorized x86_64
- implementations: cos, cosf, sin, sinf, sincos, log, logf, exp, expf, pow,
- powf.
+ implementations: cos, cosf, sin, sinf, sincos, sincosf, log, logf, exp, expf,
+ pow, powf.
The library can be disabled with --disable-mathvec. Use of the functions is
enabled with -fopenmp -ffast-math starting from -O1 for GCC version >= 4.9.0.
The library is linked in as needed when using -lm (no need to specify -lmvec
diff --git a/math/test-float-vlen16.h b/math/test-float-vlen16.h
index 802ae7b..b1890f3 100644
--- a/math/test-float-vlen16.h
+++ b/math/test-float-vlen16.h
@@ -44,6 +44,7 @@
#define WRAPPER_DECL(func) extern FLOAT func (FLOAT x);
#define WRAPPER_DECL_ff(func) extern FLOAT func (FLOAT x, FLOAT y);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
// Wrapper from scalar to vector function with vector length 16.
#define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y) \
TEST_VEC_LOOP (mr, 16); \
return ((FLOAT) mr[0]); \
}
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) \
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *); \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1) \
+{ \
+ int i; \
+ VEC_TYPE mx, mr, mr1; \
+ INIT_VEC_LOOP (mx, x, 16); \
+ vector_func (mx, &mr, &mr1); \
+ TEST_VEC_LOOP (mr, 16); \
+ TEST_VEC_LOOP (mr1, 16); \
+ *r = (FLOAT) mr[0]; \
+ *r1 = (FLOAT) mr1[0]; \
+ return; \
+}
diff --git a/math/test-float-vlen4.h b/math/test-float-vlen4.h
index f5e530b..213ae78 100644
--- a/math/test-float-vlen4.h
+++ b/math/test-float-vlen4.h
@@ -44,6 +44,7 @@
#define WRAPPER_DECL(function) extern FLOAT function (FLOAT);
#define WRAPPER_DECL_ff(function) extern FLOAT function (FLOAT, FLOAT);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
// Wrapper from scalar to vector function with vector length 4.
#define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y) \
TEST_VEC_LOOP (mr, 4); \
return ((FLOAT) mr[0]); \
}
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) \
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *); \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1) \
+{ \
+ int i; \
+ VEC_TYPE mx, mr, mr1; \
+ INIT_VEC_LOOP (mx, x, 4); \
+ vector_func (mx, &mr, &mr1); \
+ TEST_VEC_LOOP (mr, 4); \
+ TEST_VEC_LOOP (mr1, 4); \
+ *r = (FLOAT) mr[0]; \
+ *r1 = (FLOAT) mr1[0]; \
+ return; \
+}
diff --git a/math/test-float-vlen8.h b/math/test-float-vlen8.h
index 697849f..dd2fb28 100644
--- a/math/test-float-vlen8.h
+++ b/math/test-float-vlen8.h
@@ -44,6 +44,7 @@
#define WRAPPER_DECL(function) extern FLOAT function (FLOAT);
#define WRAPPER_DECL_ff(function) extern FLOAT function (FLOAT, FLOAT);
+#define WRAPPER_DECL_fFF(function) extern void function (FLOAT, FLOAT *, FLOAT *);
// Wrapper from scalar to vector function with vector length 8.
#define VECTOR_WRAPPER(scalar_func, vector_func) \
@@ -71,3 +72,19 @@ FLOAT scalar_func (FLOAT x, FLOAT y) \
TEST_VEC_LOOP (mr, 8); \
return ((FLOAT) mr[0]); \
}
+
+// Wrapper from scalar 3 argument function to vector one.
+#define VECTOR_WRAPPER_fFF(scalar_func, vector_func) \
+extern void vector_func (VEC_TYPE, VEC_TYPE *, VEC_TYPE *); \
+void scalar_func (FLOAT x, FLOAT * r, FLOAT * r1) \
+{ \
+ int i; \
+ VEC_TYPE mx, mr, mr1; \
+ INIT_VEC_LOOP (mx, x, 8); \
+ vector_func (mx, &mr, &mr1); \
+ TEST_VEC_LOOP (mr, 8); \
+ TEST_VEC_LOOP (mr1, 8); \
+ *r = (FLOAT) mr[0]; \
+ *r1 = (FLOAT) mr1[0]; \
+ return; \
+}
diff --git a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
index 6c45844..b7efeab 100644
--- a/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/libmvec.abilist
@@ -11,6 +11,7 @@ GLIBC_2.22
_ZGVbN4v_logf F
_ZGVbN4v_sinf F
_ZGVbN4vv_powf F
+ _ZGVbN4vvv_sincosf F
_ZGVcN4v_cos F
_ZGVcN4v_exp F
_ZGVcN4v_log F
@@ -22,6 +23,7 @@ GLIBC_2.22
_ZGVcN8v_logf F
_ZGVcN8v_sinf F
_ZGVcN8vv_powf F
+ _ZGVcN8vvv_sincosf F
_ZGVdN4v_cos F
_ZGVdN4v_exp F
_ZGVdN4v_log F
@@ -33,11 +35,13 @@ GLIBC_2.22
_ZGVdN8v_logf F
_ZGVdN8v_sinf F
_ZGVdN8vv_powf F
+ _ZGVdN8vvv_sincosf F
_ZGVeN16v_cosf F
_ZGVeN16v_expf F
_ZGVeN16v_logf F
_ZGVeN16v_sinf F
_ZGVeN16vv_powf F
+ _ZGVeN16vvv_sincosf F
_ZGVeN8v_cos F
_ZGVeN8v_exp F
_ZGVeN8v_log F
diff --git a/sysdeps/x86/fpu/bits/math-vector.h b/sysdeps/x86/fpu/bits/math-vector.h
index f684ff5..f9e798b 100644
--- a/sysdeps/x86/fpu/bits/math-vector.h
+++ b/sysdeps/x86/fpu/bits/math-vector.h
@@ -38,6 +38,8 @@
# define __DECL_SIMD_sinf __DECL_SIMD_x86_64
# undef __DECL_SIMD_sincos
# define __DECL_SIMD_sincos __DECL_SIMD_x86_64
+# undef __DECL_SIMD_sincosf
+# define __DECL_SIMD_sincosf __DECL_SIMD_x86_64
# undef __DECL_SIMD_log
# define __DECL_SIMD_log __DECL_SIMD_x86_64
# undef __DECL_SIMD_logf
diff --git a/sysdeps/x86_64/fpu/Makefile b/sysdeps/x86_64/fpu/Makefile
index 9c28d62..c6912cb 100644
--- a/sysdeps/x86_64/fpu/Makefile
+++ b/sysdeps/x86_64/fpu/Makefile
@@ -19,7 +19,9 @@ libmvec-support += svml_d_cos2_core svml_d_cos4_core_avx \
svml_d_pow4_core_avx svml_d_pow4_core svml_d_pow8_core \
svml_d_pow_data svml_s_powf4_core svml_s_powf8_core_avx \
svml_s_powf8_core svml_s_powf16_core svml_s_powf_data \
- init-arch
+ svml_s_sincosf4_core svml_s_sincosf8_core_avx \
+ svml_s_sincosf8_core svml_s_sincosf16_core \
+ svml_s_sincosf_data init-arch
endif
# Variables for libmvec tests.
diff --git a/sysdeps/x86_64/fpu/Versions b/sysdeps/x86_64/fpu/Versions
index d950f58..0813204 100644
--- a/sysdeps/x86_64/fpu/Versions
+++ b/sysdeps/x86_64/fpu/Versions
@@ -11,5 +11,6 @@ libmvec {
_ZGVbN4v_logf; _ZGVcN8v_logf; _ZGVdN8v_logf; _ZGVeN16v_logf;
_ZGVbN4v_expf; _ZGVcN8v_expf; _ZGVdN8v_expf; _ZGVeN16v_expf;
_ZGVbN4vv_powf; _ZGVcN8vv_powf; _ZGVdN8vv_powf; _ZGVeN16vv_powf;
+ _ZGVbN4vvv_sincosf; _ZGVcN8vvv_sincosf; _ZGVdN8vvv_sincosf; _ZGVeN16vvv_sincosf;
}
}
diff --git a/sysdeps/x86_64/fpu/libm-test-ulps b/sysdeps/x86_64/fpu/libm-test-ulps
index 74b1af5..2e2722d 100644
--- a/sysdeps/x86_64/fpu/libm-test-ulps
+++ b/sysdeps/x86_64/fpu/libm-test-ulps
@@ -2031,17 +2031,25 @@ idouble: 1
ildouble: 3
ldouble: 3
+Function: "sincos_vlen16":
+float: 1
+
Function: "sincos_vlen2":
double: 1
Function: "sincos_vlen4":
double: 1
+float: 1
Function: "sincos_vlen4_avx2":
double: 1
Function: "sincos_vlen8":
double: 1
+float: 1
+
+Function: "sincos_vlen8_avx2":
+float: 1
Function: "sinh":
double: 2
diff --git a/sysdeps/x86_64/fpu/multiarch/Makefile b/sysdeps/x86_64/fpu/multiarch/Makefile
index 9e510db..86ea473 100644
--- a/sysdeps/x86_64/fpu/multiarch/Makefile
+++ b/sysdeps/x86_64/fpu/multiarch/Makefile
@@ -69,5 +69,6 @@ libmvec-sysdep_routines += svml_d_cos2_core_sse4 svml_d_cos4_core_avx2 \
svml_s_expf16_core_avx512 svml_d_pow2_core_sse4 \
svml_d_pow4_core_avx2 svml_d_pow8_core_avx512 \
svml_s_powf4_core_sse4 svml_s_powf8_core_avx2 \
- svml_s_powf16_core_avx512
+ svml_s_powf16_core_avx512 svml_s_sincosf4_core_sse4 \
+ svml_s_sincosf8_core_avx2 svml_s_sincosf16_core_avx512
endif
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
new file mode 100644
index 0000000..0a1753e
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
@@ -0,0 +1,39 @@
+/* Multiple versions of vectorized sincosf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <init-arch.h>
+
+ .text
+ENTRY (_ZGVeN16vvv_sincosf)
+ .type _ZGVeN16vvv_sincosf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1
+ call __init_cpu_features
+1: leaq _ZGVeN16vvv_sincosf_skx(%rip), %rax
+ testl $bit_AVX512DQ_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512DQ_Usable(%rip)
+ jnz 3
+2: leaq _ZGVeN16vvv_sincosf_knl(%rip), %rax
+ testl $bit_AVX512F_Usable, __cpu_features+FEATURE_OFFSET+index_AVX512F_Usable(%rip)
+ jnz 3
+ leaq _ZGVeN16vvv_sincosf_avx2_wrapper(%rip), %rax
+3: ret
+END (_ZGVeN16vvv_sincosf)
+
+#define _ZGVeN16vvv_sincosf _ZGVeN16vvv_sincosf_avx2_wrapper
+#include "../svml_s_sincosf16_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
new file mode 100644
index 0000000..cae49f6
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
@@ -0,0 +1,504 @@
+/* Function sincosf vectorized with AVX-512. KNL and SKX versions.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+#include "svml_s_wrapper_impl.h"
+
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/4; +Pi/4] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer S for destination sign setting.
+ SS = ((S-S&1)&2)<<30; For sin part
+ SC = ((S+S&1)&2)<<30; For cos part
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" (0x4B000000) value
+ h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+ c) Swap RS & RC if if first bit of obtained value after
+ Right Shifting is set to 1. Using And, Andnot & Or operations.
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R1 = XOR( RS, SS );
+ R2 = XOR( RC, SC ). */
+
+ .text
+ENTRY (_ZGVeN16vvv_sincosf_knl)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+#else
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $1344, %rsp
+ movq __svml_ssincos_data@GOTPCREL(%rip), %rax
+ vmovaps %zmm0, %zmm2
+ movl $-1, %edx
+ vmovups __sAbsMask(%rax), %zmm0
+ vmovups __sInvPI(%rax), %zmm3
+
+/* Absolute argument computation */
+ vpandd %zmm0, %zmm2, %zmm1
+ vmovups __sPI1_FMA(%rax), %zmm5
+ vmovups __sSignMask(%rax), %zmm9
+ vpandnd %zmm2, %zmm0, %zmm0
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+ vmovaps %zmm1, %zmm6
+ vmovaps %zmm1, %zmm8
+
+/* c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value */
+ vfmadd213ps __sRShifter(%rax), %zmm1, %zmm3
+ vmovups __sPI3_FMA(%rax), %zmm7
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+ vsubps __sRShifter(%rax), %zmm3, %zmm12
+
+/* e) Treat obtained value as integer S for destination sign setting */
+ vpslld $31, %zmm3, %zmm13
+ vmovups __sA7_FMA(%rax), %zmm14
+ vfnmadd231ps %zmm12, %zmm5, %zmm6
+
+/* 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+ vmovaps %zmm14, %zmm15
+ vmovups __sA9_FMA(%rax), %zmm3
+ vcmpps $22, __sRangeReductionVal(%rax), %zmm1, %k1
+ vpbroadcastd %edx, %zmm1{%k1}{z}
+ vfnmadd231ps __sPI2_FMA(%rax), %zmm12, %zmm6
+ vptestmd %zmm1, %zmm1, %k0
+ vpandd %zmm6, %zmm9, %zmm11
+ kmovw %k0, %ecx
+ vpxord __sOneHalf(%rax), %zmm11, %zmm4
+
+/* Result sign calculations */
+ vpternlogd $150, %zmm13, %zmm9, %zmm11
+
+/* Add correction term 0.5 for cos() part */
+ vaddps %zmm4, %zmm12, %zmm10
+ vfnmadd213ps %zmm6, %zmm7, %zmm12
+ vfnmadd231ps %zmm10, %zmm5, %zmm8
+ vpxord %zmm13, %zmm12, %zmm13
+ vmulps %zmm13, %zmm13, %zmm12
+ vfnmadd231ps __sPI2_FMA(%rax), %zmm10, %zmm8
+ vfmadd231ps __sA9_FMA(%rax), %zmm12, %zmm15
+ vfnmadd213ps %zmm8, %zmm7, %zmm10
+ vfmadd213ps __sA5_FMA(%rax), %zmm12, %zmm15
+ vpxord %zmm11, %zmm10, %zmm5
+ vmulps %zmm5, %zmm5, %zmm4
+ vfmadd213ps __sA3(%rax), %zmm12, %zmm15
+ vfmadd213ps %zmm14, %zmm4, %zmm3
+ vmulps %zmm12, %zmm15, %zmm14
+ vfmadd213ps __sA5_FMA(%rax), %zmm4, %zmm3
+ vfmadd213ps %zmm13, %zmm13, %zmm14
+ vfmadd213ps __sA3(%rax), %zmm4, %zmm3
+ vpxord %zmm0, %zmm14, %zmm0
+ vmulps %zmm4, %zmm3, %zmm3
+ vfmadd213ps %zmm5, %zmm5, %zmm3
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ vmovups %zmm0, (%rdi)
+ vmovups %zmm3, (%rsi)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ vmovups %zmm2, 1152(%rsp)
+ vmovups %zmm0, 1216(%rsp)
+ vmovups %zmm3, 1280(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ kmovw %k4, 1048(%rsp)
+ xorl %eax, %eax
+ kmovw %k5, 1040(%rsp)
+ kmovw %k6, 1032(%rsp)
+ kmovw %k7, 1024(%rsp)
+ vmovups %zmm16, 960(%rsp)
+ vmovups %zmm17, 896(%rsp)
+ vmovups %zmm18, 832(%rsp)
+ vmovups %zmm19, 768(%rsp)
+ vmovups %zmm20, 704(%rsp)
+ vmovups %zmm21, 640(%rsp)
+ vmovups %zmm22, 576(%rsp)
+ vmovups %zmm23, 512(%rsp)
+ vmovups %zmm24, 448(%rsp)
+ vmovups %zmm25, 384(%rsp)
+ vmovups %zmm26, 320(%rsp)
+ vmovups %zmm27, 256(%rsp)
+ vmovups %zmm28, 192(%rsp)
+ vmovups %zmm29, 128(%rsp)
+ vmovups %zmm30, 64(%rsp)
+ vmovups %zmm31, (%rsp)
+ movq %rsi, 1056(%rsp)
+ movq %r12, 1096(%rsp)
+ cfi_offset_rel_rsp (12, 1096)
+ movb %dl, %r12b
+ movq %r13, 1088(%rsp)
+ cfi_offset_rel_rsp (13, 1088)
+ movl %eax, %r13d
+ movq %r14, 1080(%rsp)
+ cfi_offset_rel_rsp (14, 1080)
+ movl %ecx, %r14d
+ movq %r15, 1072(%rsp)
+ cfi_offset_rel_rsp (15, 1072)
+ movq %rbx, 1064(%rsp)
+ movq %rdi, %rbx
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r13d, %r14d
+ jc .LBL_1_13
+
+.LBL_1_7:
+ lea 1(%r13), %esi
+ btl %esi, %r14d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ addb $1, %r12b
+ addl $2, %r13d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ movq %rbx, %rdi
+ kmovw 1048(%rsp), %k4
+ movq 1056(%rsp), %rsi
+ kmovw 1040(%rsp), %k5
+ movq 1096(%rsp), %r12
+ cfi_restore (%r12)
+ kmovw 1032(%rsp), %k6
+ movq 1088(%rsp), %r13
+ cfi_restore (%r13)
+ kmovw 1024(%rsp), %k7
+ vmovups 960(%rsp), %zmm16
+ vmovups 896(%rsp), %zmm17
+ vmovups 832(%rsp), %zmm18
+ vmovups 768(%rsp), %zmm19
+ vmovups 704(%rsp), %zmm20
+ vmovups 640(%rsp), %zmm21
+ vmovups 576(%rsp), %zmm22
+ vmovups 512(%rsp), %zmm23
+ vmovups 448(%rsp), %zmm24
+ vmovups 384(%rsp), %zmm25
+ vmovups 320(%rsp), %zmm26
+ vmovups 256(%rsp), %zmm27
+ vmovups 192(%rsp), %zmm28
+ vmovups 128(%rsp), %zmm29
+ vmovups 64(%rsp), %zmm30
+ vmovups (%rsp), %zmm31
+ movq 1080(%rsp), %r14
+ cfi_restore (%r14)
+ movq 1072(%rsp), %r15
+ cfi_restore (%r15)
+ movq 1064(%rsp), %rbx
+ vmovups 1216(%rsp), %zmm0
+ vmovups 1280(%rsp), %zmm3
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 1156(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1220(%rsp,%r15,8)
+ vmovss 1156(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 1284(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_13:
+ movzbl %r12b, %r15d
+ vmovss 1152(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1216(%rsp,%r15,8)
+ vmovss 1152(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 1280(%rsp,%r15,8)
+ jmp .LBL_1_7
+#endif
+END (_ZGVeN16vvv_sincosf_knl)
+
+ENTRY (_ZGVeN16vvv_sincosf_skx)
+#ifndef HAVE_AVX512_ASM_SUPPORT
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+#else
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $1344, %rsp
+ movq __svml_ssincos_data@GOTPCREL(%rip), %rax
+ vmovaps %zmm0, %zmm4
+ vmovups __sAbsMask(%rax), %zmm3
+ vmovups __sInvPI(%rax), %zmm5
+ vmovups __sRShifter(%rax), %zmm6
+ vmovups __sPI1_FMA(%rax), %zmm9
+ vmovups __sPI2_FMA(%rax), %zmm10
+ vmovups __sSignMask(%rax), %zmm14
+ vmovups __sOneHalf(%rax), %zmm7
+ vmovups __sPI3_FMA(%rax), %zmm12
+
+/* Absolute argument computation */
+ vandps %zmm3, %zmm4, %zmm2
+
+/* c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value */
+ vfmadd213ps %zmm6, %zmm2, %zmm5
+ vcmpps $18, __sRangeReductionVal(%rax), %zmm2, %k1
+
+/* e) Treat obtained value as integer S for destination sign setting */
+ vpslld $31, %zmm5, %zmm0
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+ vsubps %zmm6, %zmm5, %zmm5
+ vmovups __sA3(%rax), %zmm6
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+ vmovaps %zmm2, %zmm11
+ vfnmadd231ps %zmm5, %zmm9, %zmm11
+ vfnmadd231ps %zmm5, %zmm10, %zmm11
+ vandps %zmm11, %zmm14, %zmm1
+ vxorps %zmm1, %zmm7, %zmm8
+
+/* Result sign calculations */
+ vpternlogd $150, %zmm0, %zmm14, %zmm1
+ vmovups .L_2il0floatpacket.13(%rip), %zmm14
+
+/* Add correction term 0.5 for cos() part */
+ vaddps %zmm8, %zmm5, %zmm15
+ vfnmadd213ps %zmm11, %zmm12, %zmm5
+ vandnps %zmm4, %zmm3, %zmm11
+ vmovups __sA7_FMA(%rax), %zmm3
+ vmovaps %zmm2, %zmm13
+ vfnmadd231ps %zmm15, %zmm9, %zmm13
+ vxorps %zmm0, %zmm5, %zmm9
+ vmovups __sA5_FMA(%rax), %zmm0
+ vfnmadd231ps %zmm15, %zmm10, %zmm13
+ vmulps %zmm9, %zmm9, %zmm8
+ vfnmadd213ps %zmm13, %zmm12, %zmm15
+ vmovups __sA9_FMA(%rax), %zmm12
+ vxorps %zmm1, %zmm15, %zmm1
+ vmulps %zmm1, %zmm1, %zmm13
+
+/* 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+ vmovaps %zmm12, %zmm7
+ vfmadd213ps %zmm3, %zmm8, %zmm7
+ vfmadd213ps %zmm3, %zmm13, %zmm12
+ vfmadd213ps %zmm0, %zmm8, %zmm7
+ vfmadd213ps %zmm0, %zmm13, %zmm12
+ vfmadd213ps %zmm6, %zmm8, %zmm7
+ vfmadd213ps %zmm6, %zmm13, %zmm12
+ vmulps %zmm8, %zmm7, %zmm10
+ vmulps %zmm13, %zmm12, %zmm3
+ vfmadd213ps %zmm9, %zmm9, %zmm10
+ vfmadd213ps %zmm1, %zmm1, %zmm3
+ vxorps %zmm11, %zmm10, %zmm0
+ vpandnd %zmm2, %zmm2, %zmm14{%k1}
+ vptestmd %zmm14, %zmm14, %k0
+ kmovw %k0, %ecx
+ testl %ecx, %ecx
+ jne .LBL_2_3
+
+.LBL_2_2:
+ cfi_remember_state
+ vmovups %zmm0, (%rdi)
+ vmovups %zmm3, (%rsi)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_2_3:
+ cfi_restore_state
+ vmovups %zmm4, 1152(%rsp)
+ vmovups %zmm0, 1216(%rsp)
+ vmovups %zmm3, 1280(%rsp)
+ je .LBL_2_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ kmovw %k4, 1048(%rsp)
+ kmovw %k5, 1040(%rsp)
+ kmovw %k6, 1032(%rsp)
+ kmovw %k7, 1024(%rsp)
+ vmovups %zmm16, 960(%rsp)
+ vmovups %zmm17, 896(%rsp)
+ vmovups %zmm18, 832(%rsp)
+ vmovups %zmm19, 768(%rsp)
+ vmovups %zmm20, 704(%rsp)
+ vmovups %zmm21, 640(%rsp)
+ vmovups %zmm22, 576(%rsp)
+ vmovups %zmm23, 512(%rsp)
+ vmovups %zmm24, 448(%rsp)
+ vmovups %zmm25, 384(%rsp)
+ vmovups %zmm26, 320(%rsp)
+ vmovups %zmm27, 256(%rsp)
+ vmovups %zmm28, 192(%rsp)
+ vmovups %zmm29, 128(%rsp)
+ vmovups %zmm30, 64(%rsp)
+ vmovups %zmm31, (%rsp)
+ movq %rsi, 1056(%rsp)
+ movq %r12, 1096(%rsp)
+ cfi_offset_rel_rsp (12, 1096)
+ movb %dl, %r12b
+ movq %r13, 1088(%rsp)
+ cfi_offset_rel_rsp (13, 1088)
+ movl %eax, %r13d
+ movq %r14, 1080(%rsp)
+ cfi_offset_rel_rsp (14, 1080)
+ movl %ecx, %r14d
+ movq %r15, 1072(%rsp)
+ cfi_offset_rel_rsp (15, 1072)
+ movq %rbx, 1064(%rsp)
+ movq %rdi, %rbx
+ cfi_remember_state
+
+.LBL_2_6:
+ btl %r13d, %r14d
+ jc .LBL_2_13
+
+.LBL_2_7:
+ lea 1(%r13), %esi
+ btl %esi, %r14d
+ jc .LBL_2_10
+
+.LBL_2_8:
+ incb %r12b
+ addl $2, %r13d
+ cmpb $16, %r12b
+ jb .LBL_2_6
+
+ kmovw 1048(%rsp), %k4
+ movq %rbx, %rdi
+ kmovw 1040(%rsp), %k5
+ kmovw 1032(%rsp), %k6
+ kmovw 1024(%rsp), %k7
+ vmovups 960(%rsp), %zmm16
+ vmovups 896(%rsp), %zmm17
+ vmovups 832(%rsp), %zmm18
+ vmovups 768(%rsp), %zmm19
+ vmovups 704(%rsp), %zmm20
+ vmovups 640(%rsp), %zmm21
+ vmovups 576(%rsp), %zmm22
+ vmovups 512(%rsp), %zmm23
+ vmovups 448(%rsp), %zmm24
+ vmovups 384(%rsp), %zmm25
+ vmovups 320(%rsp), %zmm26
+ vmovups 256(%rsp), %zmm27
+ vmovups 192(%rsp), %zmm28
+ vmovups 128(%rsp), %zmm29
+ vmovups 64(%rsp), %zmm30
+ vmovups (%rsp), %zmm31
+ vmovups 1216(%rsp), %zmm0
+ vmovups 1280(%rsp), %zmm3
+ movq 1056(%rsp), %rsi
+ movq 1096(%rsp), %r12
+ cfi_restore (%r12)
+ movq 1088(%rsp), %r13
+ cfi_restore (%r13)
+ movq 1080(%rsp), %r14
+ cfi_restore (%r14)
+ movq 1072(%rsp), %r15
+ cfi_restore (%r15)
+ movq 1064(%rsp), %rbx
+ jmp .LBL_2_2
+
+.LBL_2_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 1156(%rsp,%r15,8), %xmm0
+ vzeroupper
+ vmovss 1156(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1220(%rsp,%r15,8)
+ vmovss 1156(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 1284(%rsp,%r15,8)
+ jmp .LBL_2_8
+
+.LBL_2_13:
+ movzbl %r12b, %r15d
+ vmovss 1152(%rsp,%r15,8), %xmm0
+ vzeroupper
+ vmovss 1152(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ vmovss %xmm0, 1216(%rsp,%r15,8)
+ vmovss 1152(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 1280(%rsp,%r15,8)
+ jmp .LBL_2_7
+#endif
+END (_ZGVeN16vvv_sincosf_skx)
+
+ .section .rodata, "a"
+.L_2il0floatpacket.13:
+ .long 0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff,0xffffffff
+ .type .L_2il0floatpacket.13,@object
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
similarity index 54%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
index 3e74118..610046b 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Multiple versions of vectorized sincosf.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,12 +16,23 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include <init-arch.h>
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+ .text
+ENTRY (_ZGVbN4vvv_sincosf)
+ .type _ZGVbN4vvv_sincosf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1f
+ call __init_cpu_features
+1: leaq _ZGVbN4vvv_sincosf_sse4(%rip), %rax
+ testl $bit_SSE4_1, __cpu_features+CPUID_OFFSET+index_SSE4_1(%rip)
+ jz 2f
+ ret
+2: leaq _ZGVbN4vvv_sincosf_sse2(%rip), %rax
+ ret
+END (_ZGVbN4vvv_sincosf)
+libmvec_hidden_def (_ZGVbN4vvv_sincosf)
-#include "libm-test.c"
+#define _ZGVbN4vvv_sincosf _ZGVbN4vvv_sincosf_sse2
+#include "../svml_s_sincosf4_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
new file mode 100644
index 0000000..8c51e44
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
@@ -0,0 +1,268 @@
+/* Function sincosf vectorized with SSE4.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+
+ .text
+ENTRY (_ZGVbN4vvv_sincosf_sse4)
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/4; +Pi/4] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer S for destination sign setting.
+ SS = ((S-S&1)&2)<<30; For sin part
+ SC = ((S+S&1)&2)<<30; For cos part
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" (0x4B000000) value
+ h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+ c) Swap RS & RC if if first bit of obtained value after
+ Right Shifting is set to 1. Using And, Andnot & Or operations.
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R1 = XOR( RS, SS );
+ R2 = XOR( RC, SC ). */
+
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $320, %rsp
+ movq __svml_ssincos_data@GOTPCREL(%rip), %rax
+ movups %xmm12, 176(%rsp)
+ movups %xmm9, 160(%rsp)
+ movups __sAbsMask(%rax), %xmm12
+
+/* Absolute argument computation */
+ movaps %xmm12, %xmm5
+ andnps %xmm0, %xmm12
+ movups __sInvPI(%rax), %xmm7
+ andps %xmm0, %xmm5
+
+/* c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value. */
+ mulps %xmm5, %xmm7
+ movups %xmm10, 144(%rsp)
+ movups __sPI1(%rax), %xmm10
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3. */
+ movaps %xmm10, %xmm1
+ addps __sRShifter(%rax), %xmm7
+
+/* e) Treat obtained value as integer S for destination sign setting */
+ movaps %xmm7, %xmm9
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+ subps __sRShifter(%rax), %xmm7
+ mulps %xmm7, %xmm1
+ pslld $31, %xmm9
+ movups __sPI2(%rax), %xmm6
+ movups %xmm13, 112(%rsp)
+ movaps %xmm5, %xmm13
+ movaps %xmm6, %xmm2
+ subps %xmm1, %xmm13
+ mulps %xmm7, %xmm2
+ movups __sSignMask(%rax), %xmm3
+ movaps %xmm5, %xmm1
+ movups __sOneHalf(%rax), %xmm4
+ subps %xmm2, %xmm13
+ cmpnleps __sRangeReductionVal(%rax), %xmm5
+ movaps %xmm3, %xmm2
+ andps %xmm13, %xmm2
+ xorps %xmm2, %xmm4
+
+/* Result sign calculations */
+ xorps %xmm2, %xmm3
+ xorps %xmm9, %xmm3
+
+/* Add correction term 0.5 for cos() part */
+ addps %xmm7, %xmm4
+ movmskps %xmm5, %ecx
+ mulps %xmm4, %xmm10
+ mulps %xmm4, %xmm6
+ subps %xmm10, %xmm1
+ movups __sPI3(%rax), %xmm10
+ subps %xmm6, %xmm1
+ movaps %xmm10, %xmm6
+ mulps %xmm7, %xmm6
+ mulps %xmm4, %xmm10
+ subps %xmm6, %xmm13
+ subps %xmm10, %xmm1
+ movups __sPI4(%rax), %xmm6
+ mulps %xmm6, %xmm7
+ mulps %xmm6, %xmm4
+ subps %xmm7, %xmm13
+ subps %xmm4, %xmm1
+ xorps %xmm9, %xmm13
+ xorps %xmm3, %xmm1
+ movaps %xmm13, %xmm4
+ movaps %xmm1, %xmm2
+ mulps %xmm13, %xmm4
+ mulps %xmm1, %xmm2
+ movups __sA9(%rax), %xmm7
+
+/* 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+ movaps %xmm7, %xmm3
+ mulps %xmm4, %xmm3
+ mulps %xmm2, %xmm7
+ addps __sA7(%rax), %xmm3
+ addps __sA7(%rax), %xmm7
+ mulps %xmm4, %xmm3
+ mulps %xmm2, %xmm7
+ addps __sA5(%rax), %xmm3
+ addps __sA5(%rax), %xmm7
+ mulps %xmm4, %xmm3
+ mulps %xmm2, %xmm7
+ addps __sA3(%rax), %xmm3
+ addps __sA3(%rax), %xmm7
+ mulps %xmm3, %xmm4
+ mulps %xmm7, %xmm2
+ mulps %xmm13, %xmm4
+ mulps %xmm1, %xmm2
+ addps %xmm4, %xmm13
+ addps %xmm2, %xmm1
+ xorps %xmm12, %xmm13
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ movups 160(%rsp), %xmm9
+ movaps %xmm13, (%rdi)
+ movups 144(%rsp), %xmm10
+ movups 176(%rsp), %xmm12
+ movups 112(%rsp), %xmm13
+ movups %xmm1, (%rsi)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ movups %xmm0, 128(%rsp)
+ movups %xmm13, 192(%rsp)
+ movups %xmm1, 256(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ movups %xmm8, 48(%rsp)
+ movups %xmm11, 32(%rsp)
+ movups %xmm14, 16(%rsp)
+ movups %xmm15, (%rsp)
+ movq %rsi, 64(%rsp)
+ movq %r12, 104(%rsp)
+ cfi_offset_rel_rsp (12, 104)
+ movb %dl, %r12b
+ movq %r13, 96(%rsp)
+ cfi_offset_rel_rsp (13, 96)
+ movl %eax, %r13d
+ movq %r14, 88(%rsp)
+ cfi_offset_rel_rsp (14, 88)
+ movl %ecx, %r14d
+ movq %r15, 80(%rsp)
+ cfi_offset_rel_rsp (15, 80)
+ movq %rbx, 72(%rsp)
+ movq %rdi, %rbx
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r13d, %r14d
+ jc .LBL_1_13
+
+.LBL_1_7:
+ lea 1(%r13), %esi
+ btl %esi, %r14d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ incb %r12b
+ addl $2, %r13d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ movups 48(%rsp), %xmm8
+ movq %rbx, %rdi
+ movups 32(%rsp), %xmm11
+ movups 16(%rsp), %xmm14
+ movups (%rsp), %xmm15
+ movq 64(%rsp), %rsi
+ movq 104(%rsp), %r12
+ cfi_restore (%r12)
+ movq 96(%rsp), %r13
+ cfi_restore (%r13)
+ movq 88(%rsp), %r14
+ cfi_restore (%r14)
+ movq 80(%rsp), %r15
+ cfi_restore (%r15)
+ movq 72(%rsp), %rbx
+ movups 192(%rsp), %xmm13
+ movups 256(%rsp), %xmm1
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ movss 132(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ movss %xmm0, 196(%rsp,%r15,8)
+ movss 132(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ movss %xmm0, 260(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_13:
+ movzbl %r12b, %r15d
+ movss 128(%rsp,%r15,8), %xmm0
+
+ call sinf@PLT
+
+ movss %xmm0, 192(%rsp,%r15,8)
+ movss 128(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ movss %xmm0, 256(%rsp,%r15,8)
+ jmp .LBL_1_7
+
+END (_ZGVbN4vvv_sincosf_sse4)
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
new file mode 100644
index 0000000..9e5be67
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
@@ -0,0 +1,38 @@
+/* Multiple versions of vectorized sincosf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include <init-arch.h>
+
+ .text
+ENTRY (_ZGVdN8vvv_sincosf)
+ .type _ZGVdN8vvv_sincosf, @gnu_indirect_function
+ cmpl $0, KIND_OFFSET+__cpu_features(%rip)
+ jne 1f
+ call __init_cpu_features
+1: leaq _ZGVdN8vvv_sincosf_avx2(%rip), %rax
+ testl $bit_AVX2_Usable, __cpu_features+FEATURE_OFFSET+index_AVX2_Usable(%rip)
+ jz 2f
+ ret
+2: leaq _ZGVdN8vvv_sincosf_sse_wrapper(%rip), %rax
+ ret
+END (_ZGVdN8vvv_sincosf)
+libmvec_hidden_def (_ZGVdN8vvv_sincosf)
+
+#define _ZGVdN8vvv_sincosf _ZGVdN8vvv_sincosf_sse_wrapper
+#include "../svml_s_sincosf8_core.S"
diff --git a/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
new file mode 100644
index 0000000..153c315
--- /dev/null
+++ b/sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
@@ -0,0 +1,241 @@
+/* Function sincosf vectorized with AVX2.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <sysdep.h>
+#include "svml_s_sincosf_data.h"
+
+ .text
+ENTRY(_ZGVdN8vvv_sincosf_avx2)
+/*
+ ALGORITHM DESCRIPTION:
+
+ 1) Range reduction to [-Pi/4; +Pi/4] interval
+ a) Grab sign from source argument and save it.
+ b) Remove sign using AND operation
+ c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value
+ e) Treat obtained value as integer S for destination sign setting.
+ SS = ((S-S&1)&2)<<30; For sin part
+ SC = ((S+S&1)&2)<<30; For cos part
+ f) Change destination sign if source sign is negative
+ using XOR operation.
+ g) Subtract "Right Shifter" (0x4B000000) value
+ h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 4 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 - Y*PI4;
+ 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4))));
+ c) Swap RS & RC if if first bit of obtained value after
+ Right Shifting is set to 1. Using And, Andnot & Or operations.
+ 3) Destination sign setting
+ a) Set shifted destination sign using XOR operation:
+ R1 = XOR( RS, SS );
+ R2 = XOR( RC, SC ). */
+
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ subq $448, %rsp
+ movq __svml_ssincos_data@GOTPCREL(%rip), %rax
+ vmovdqa %ymm0, %ymm5
+ vmovups %ymm13, 352(%rsp)
+ vmovups __sAbsMask(%rax), %ymm2
+ vmovups __sInvPI(%rax), %ymm1
+ vmovups __sPI1_FMA(%rax), %ymm13
+ vmovups %ymm15, 288(%rsp)
+
+/* Absolute argument computation */
+ vandps %ymm2, %ymm5, %ymm4
+
+/* c) Getting octant Y by 2/Pi multiplication
+ d) Add "Right Shifter" value */
+ vfmadd213ps __sRShifter(%rax), %ymm4, %ymm1
+
+/* e) Treat obtained value as integer S for destination sign setting */
+ vpslld $31, %ymm1, %ymm0
+
+/* g) Subtract "Right Shifter" (0x4B000000) value */
+ vsubps __sRShifter(%rax), %ymm1, %ymm1
+
+/* h) Subtract Y*(PI/2) from X argument, where PI/2 divided to 3 parts:
+ X = X - Y*PI1 - Y*PI2 - Y*PI3 */
+ vmovdqa %ymm4, %ymm7
+ vfnmadd231ps %ymm1, %ymm13, %ymm7
+ vfnmadd231ps __sPI2_FMA(%rax), %ymm1, %ymm7
+ vandps __sSignMask(%rax), %ymm7, %ymm15
+ vxorps __sOneHalf(%rax), %ymm15, %ymm6
+
+/* Add correction term 0.5 for cos() part */
+ vaddps %ymm6, %ymm1, %ymm6
+ vmovdqa %ymm4, %ymm3
+ vfnmadd231ps %ymm6, %ymm13, %ymm3
+ vmovups __sPI3_FMA(%rax), %ymm13
+ vcmpnle_uqps __sRangeReductionVal(%rax), %ymm4, %ymm4
+ vfnmadd231ps __sPI2_FMA(%rax), %ymm6, %ymm3
+ vfnmadd213ps %ymm7, %ymm13, %ymm1
+ vfnmadd213ps %ymm3, %ymm13, %ymm6
+
+/* Result sign calculations */
+ vxorps __sSignMask(%rax), %ymm15, %ymm3
+ vxorps %ymm0, %ymm3, %ymm7
+ vxorps %ymm7, %ymm6, %ymm3
+ vxorps %ymm0, %ymm1, %ymm15
+ vandnps %ymm5, %ymm2, %ymm6
+ vmovups __sA7_FMA(%rax), %ymm2
+ vmulps %ymm15, %ymm15, %ymm13
+ vmovups __sA9_FMA(%rax), %ymm7
+ vmulps %ymm3, %ymm3, %ymm1
+
+/* 2) Polynomial (minimax for sin within [-Pi/4; +Pi/4] interval)
+ a) Calculate X^2 = X * X
+ b) Calculate 2 polynomials for sin and cos:
+ RS = X * ( A0 + X^2 * (A1 + x^2 * (A2 + x^2 * (A3))));
+ RC = B0 + X^2 * (B1 + x^2 * (B2 + x^2 * (B3 + x^2 * (B4)))) */
+ vmovdqa %ymm2, %ymm0
+ vfmadd231ps __sA9_FMA(%rax), %ymm13, %ymm0
+ vfmadd213ps %ymm2, %ymm1, %ymm7
+ vfmadd213ps __sA5_FMA(%rax), %ymm13, %ymm0
+ vfmadd213ps __sA5_FMA(%rax), %ymm1, %ymm7
+ vfmadd213ps __sA3(%rax), %ymm13, %ymm0
+ vfmadd213ps __sA3(%rax), %ymm1, %ymm7
+ vmulps %ymm13, %ymm0, %ymm13
+ vmulps %ymm1, %ymm7, %ymm1
+ vfmadd213ps %ymm15, %ymm15, %ymm13
+ vfmadd213ps %ymm3, %ymm3, %ymm1
+ vmovmskps %ymm4, %ecx
+ vxorps %ymm6, %ymm13, %ymm0
+ testl %ecx, %ecx
+ jne .LBL_1_3
+
+.LBL_1_2:
+ cfi_remember_state
+ vmovups 352(%rsp), %ymm13
+ vmovups 288(%rsp), %ymm15
+ vmovups %ymm0, (%rdi)
+ vmovups %ymm1, (%rsi)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+
+.LBL_1_3:
+ cfi_restore_state
+ vmovups %ymm5, 256(%rsp)
+ vmovups %ymm0, 320(%rsp)
+ vmovups %ymm1, 384(%rsp)
+ je .LBL_1_2
+
+ xorb %dl, %dl
+ xorl %eax, %eax
+ vmovups %ymm8, 160(%rsp)
+ vmovups %ymm9, 128(%rsp)
+ vmovups %ymm10, 96(%rsp)
+ vmovups %ymm11, 64(%rsp)
+ vmovups %ymm12, 32(%rsp)
+ vmovups %ymm14, (%rsp)
+ movq %rsi, 192(%rsp)
+ movq %r12, 232(%rsp)
+ cfi_offset_rel_rsp (12, 232)
+ movb %dl, %r12b
+ movq %r13, 224(%rsp)
+ cfi_offset_rel_rsp (13, 224)
+ movl %eax, %r13d
+ movq %r14, 216(%rsp)
+ cfi_offset_rel_rsp (14, 216)
+ movl %ecx, %r14d
+ movq %r15, 208(%rsp)
+ cfi_offset_rel_rsp (14, 208)
+ movq %rbx, 200(%rsp)
+ movq %rdi, %rbx
+ cfi_remember_state
+
+.LBL_1_6:
+ btl %r13d, %r14d
+ jc .LBL_1_13
+
+.LBL_1_7:
+ lea 1(%r13), %esi
+ btl %esi, %r14d
+ jc .LBL_1_10
+
+.LBL_1_8:
+ incb %r12b
+ addl $2, %r13d
+ cmpb $16, %r12b
+ jb .LBL_1_6
+
+ vmovups 160(%rsp), %ymm8
+ movq %rbx, %rdi
+ vmovups 128(%rsp), %ymm9
+ vmovups 96(%rsp), %ymm10
+ vmovups 64(%rsp), %ymm11
+ vmovups 32(%rsp), %ymm12
+ vmovups (%rsp), %ymm14
+ vmovups 320(%rsp), %ymm0
+ vmovups 384(%rsp), %ymm1
+ movq 192(%rsp), %rsi
+ movq 232(%rsp), %r12
+ cfi_restore (%r12)
+ movq 224(%rsp), %r13
+ cfi_restore (%r13)
+ movq 216(%rsp), %r14
+ cfi_restore (%r14)
+ movq 208(%rsp), %r15
+ cfi_restore (%r15)
+ movq 200(%rsp), %rbx
+ jmp .LBL_1_2
+
+.LBL_1_10:
+ cfi_restore_state
+ movzbl %r12b, %r15d
+ vmovss 260(%rsp,%r15,8), %xmm0
+ vzeroupper
+
+ call sinf@PLT
+
+ vmovss %xmm0, 324(%rsp,%r15,8)
+ vmovss 260(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 388(%rsp,%r15,8)
+ jmp .LBL_1_8
+
+.LBL_1_13:
+ movzbl %r12b, %r15d
+ vmovss 256(%rsp,%r15,8), %xmm0
+ vzeroupper
+
+ call sinf@PLT
+
+ vmovss %xmm0, 320(%rsp,%r15,8)
+ vmovss 256(%rsp,%r15,8), %xmm0
+
+ call cosf@PLT
+
+ vmovss %xmm0, 384(%rsp,%r15,8)
+ jmp .LBL_1_7
+
+END(_ZGVdN8vvv_sincosf_avx2)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
similarity index 76%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
index 3e74118..992f9a9 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with AVX-512. Wrapper to AVX2 version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,12 +16,10 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
-
-#include "libm-test.c"
+ .text
+ENTRY (_ZGVeN16vvv_sincosf)
+WRAPPER_IMPL_AVX512_fFF _ZGVdN8vvv_sincosf
+END (_ZGVeN16vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
similarity index 75%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
index 3e74118..d402ffb 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with SSE2.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,12 +16,15 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#include "libm-test.c"
+ .text
+ENTRY (_ZGVbN4vvv_sincosf)
+WRAPPER_IMPL_SSE2_fFF sincosf
+END (_ZGVbN4vvv_sincosf)
+
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVbN4vvv_sincosf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
similarity index 73%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
index 3e74118..eec7de8 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized with AVX2, wrapper version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,12 +16,14 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
+ .text
+ENTRY (_ZGVdN8vvv_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
+END (_ZGVdN8vvv_sincosf)
-#include "libm-test.c"
+#ifndef USE_MULTIARCH
+ libmvec_hidden_def (_ZGVdN8vvv_sincosf)
+#endif
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
similarity index 76%
copy from sysdeps/x86_64/fpu/test-float-vlen4.c
copy to sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
index 3e74118..c247444 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
@@ -1,4 +1,4 @@
-/* Tests for SSE ISA versions of vector math functions.
+/* Function sincosf vectorized in AVX ISA as wrapper to SSE4 ISA version.
Copyright (C) 2014-2015 Free Software Foundation, Inc.
This file is part of the GNU C Library.
@@ -16,12 +16,10 @@
License along with the GNU C Library; if not, see
<http://www.gnu.org/licenses/>. */
-#include "test-float-vlen4.h"
+#include <sysdep.h>
+#include "svml_s_wrapper_impl.h"
-#define TEST_VECTOR_cosf 1
-#define TEST_VECTOR_sinf 1
-#define TEST_VECTOR_logf 1
-#define TEST_VECTOR_expf 1
-#define TEST_VECTOR_powf 1
-
-#include "libm-test.c"
+ .text
+ENTRY(_ZGVcN8vvv_sincosf)
+WRAPPER_IMPL_AVX_fFF _ZGVbN4vvv_sincosf
+END(_ZGVcN8vvv_sincosf)
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf_data.S b/sysdeps/x86_64/fpu/svml_s_sincosf_data.S
new file mode 100644
index 0000000..040414d
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf_data.S
@@ -0,0 +1,1140 @@
+/* Data for function sincosf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include "svml_s_sincosf_data.h"
+
+ .section .rodata, "a"
+ .align 64
+ .align 64
+
+/* Data table for vector implementations of function sincosf.
+ The table may contain polynomial, reduction, lookup coefficients
+ and other coefficients obtained through different methods of research
+ and experimental work. */
+
+ .globl __svml_ssincos_data
+__svml_ssincos_data:
+
+/* Lookup table for high accuracy version (CHL,SHi,SLo,Sigma) */
+.if .-__svml_ssincos_data != __dT
+.err
+.endif
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x3f800000
+ .long 0xb99de7df
+ .long 0x3cc90ab0
+ .long 0xb005c998
+ .long 0x3f800000
+ .long 0xba9de1c8
+ .long 0x3d48fb30
+ .long 0xb0ef227f
+ .long 0x3f800000
+ .long 0xbb319298
+ .long 0x3d96a905
+ .long 0xb1531e61
+ .long 0x3f800000
+ .long 0xbb9dc971
+ .long 0x3dc8bd36
+ .long 0xb07592f5
+ .long 0x3f800000
+ .long 0xbbf66e3c
+ .long 0x3dfab273
+ .long 0xb11568cf
+ .long 0x3f800000
+ .long 0xbc315502
+ .long 0x3e164083
+ .long 0x31e8e614
+ .long 0x3f800000
+ .long 0xbc71360b
+ .long 0x3e2f10a2
+ .long 0x311167f9
+ .long 0x3f800000
+ .long 0xbc9d6830
+ .long 0x3e47c5c2
+ .long 0xb0e5967d
+ .long 0x3f800000
+ .long 0xbcc70c54
+ .long 0x3e605c13
+ .long 0x31a7e4f6
+ .long 0x3f800000
+ .long 0xbcf58104
+ .long 0x3e78cfcc
+ .long 0xb11bd41d
+ .long 0x3f800000
+ .long 0xbd145f8c
+ .long 0x3e888e93
+ .long 0x312c7d9e
+ .long 0x3f800000
+ .long 0xbd305f55
+ .long 0x3e94a031
+ .long 0x326d59f0
+ .long 0x3f800000
+ .long 0xbd4ebb8a
+ .long 0x3ea09ae5
+ .long 0xb23e89a0
+ .long 0x3f800000
+ .long 0xbd6f6f7e
+ .long 0x3eac7cd4
+ .long 0xb2254e02
+ .long 0x3f800000
+ .long 0xbd893b12
+ .long 0x3eb8442a
+ .long 0xb2705ba6
+ .long 0x3f800000
+ .long 0xbd9be50c
+ .long 0x3ec3ef15
+ .long 0x31d5d52c
+ .long 0x3f800000
+ .long 0xbdafb2cc
+ .long 0x3ecf7bca
+ .long 0x316a3b63
+ .long 0x3f800000
+ .long 0xbdc4a143
+ .long 0x3edae880
+ .long 0x321e15cc
+ .long 0x3f800000
+ .long 0xbddaad38
+ .long 0x3ee63375
+ .long 0xb1d9c774
+ .long 0x3f800000
+ .long 0xbdf1d344
+ .long 0x3ef15aea
+ .long 0xb1ff2139
+ .long 0x3f800000
+ .long 0xbe0507ea
+ .long 0x3efc5d27
+ .long 0xb180eca9
+ .long 0x3f800000
+ .long 0xbe11af97
+ .long 0x3f039c3d
+ .long 0xb25ba002
+ .long 0x3f800000
+ .long 0xbe1edeb5
+ .long 0x3f08f59b
+ .long 0xb2be4b4e
+ .long 0x3f800000
+ .long 0xbe2c933b
+ .long 0x3f0e39da
+ .long 0xb24a32e7
+ .long 0x3f800000
+ .long 0xbe3acb0c
+ .long 0x3f13682a
+ .long 0x32cdd12e
+ .long 0x3f800000
+ .long 0xbe4983f7
+ .long 0x3f187fc0
+ .long 0xb1c7a3f3
+ .long 0x3f800000
+ .long 0xbe58bbb7
+ .long 0x3f1d7fd1
+ .long 0x3292050c
+ .long 0x3f800000
+ .long 0xbe686ff3
+ .long 0x3f226799
+ .long 0x322123bb
+ .long 0x3f800000
+ .long 0xbe789e3f
+ .long 0x3f273656
+ .long 0xb2038343
+ .long 0x3f800000
+ .long 0xbe84a20e
+ .long 0x3f2beb4a
+ .long 0xb2b73136
+ .long 0x3f800000
+ .long 0xbe8d2f7d
+ .long 0x3f3085bb
+ .long 0xb2ae2d32
+ .long 0x3f800000
+ .long 0xbe95f61a
+ .long 0x3f3504f3
+ .long 0x324fe77a
+ .long 0x3f800000
+ .long 0x3e4216eb
+ .long 0x3f396842
+ .long 0xb2810007
+ .long 0x3f000000
+ .long 0x3e2fad27
+ .long 0x3f3daef9
+ .long 0x319aabec
+ .long 0x3f000000
+ .long 0x3e1cd957
+ .long 0x3f41d870
+ .long 0x32bff977
+ .long 0x3f000000
+ .long 0x3e099e65
+ .long 0x3f45e403
+ .long 0x32b15174
+ .long 0x3f000000
+ .long 0x3debfe8a
+ .long 0x3f49d112
+ .long 0x32992640
+ .long 0x3f000000
+ .long 0x3dc3fdff
+ .long 0x3f4d9f02
+ .long 0x327e70e8
+ .long 0x3f000000
+ .long 0x3d9b4153
+ .long 0x3f514d3d
+ .long 0x300c4f04
+ .long 0x3f000000
+ .long 0x3d639d9d
+ .long 0x3f54db31
+ .long 0x3290ea1a
+ .long 0x3f000000
+ .long 0x3d0f59aa
+ .long 0x3f584853
+ .long 0xb27d5fc0
+ .long 0x3f000000
+ .long 0x3c670f32
+ .long 0x3f5b941a
+ .long 0x32232dc8
+ .long 0x3f000000
+ .long 0xbbe8b648
+ .long 0x3f5ebe05
+ .long 0x32c6f953
+ .long 0x3f000000
+ .long 0xbcea5164
+ .long 0x3f61c598
+ .long 0xb2e7f425
+ .long 0x3f000000
+ .long 0xbd4e645a
+ .long 0x3f64aa59
+ .long 0x311a08fa
+ .long 0x3f000000
+ .long 0xbd945dff
+ .long 0x3f676bd8
+ .long 0xb2bc3389
+ .long 0x3f000000
+ .long 0xbdc210d8
+ .long 0x3f6a09a7
+ .long 0xb2eb236c
+ .long 0x3f000000
+ .long 0xbdf043ab
+ .long 0x3f6c835e
+ .long 0x32f328d4
+ .long 0x3f000000
+ .long 0xbe0f77ad
+ .long 0x3f6ed89e
+ .long 0xb29333dc
+ .long 0x3f000000
+ .long 0x3db1f34f
+ .long 0x3f710908
+ .long 0x321ed0dd
+ .long 0x3e800000
+ .long 0x3d826b93
+ .long 0x3f731447
+ .long 0x32c48e11
+ .long 0x3e800000
+ .long 0x3d25018c
+ .long 0x3f74fa0b
+ .long 0xb2939d22
+ .long 0x3e800000
+ .long 0x3c88e931
+ .long 0x3f76ba07
+ .long 0x326d092c
+ .long 0x3e800000
+ .long 0xbbe60685
+ .long 0x3f7853f8
+ .long 0xb20db9e5
+ .long 0x3e800000
+ .long 0xbcfd1f65
+ .long 0x3f79c79d
+ .long 0x32c64e59
+ .long 0x3e800000
+ .long 0xbd60e8f8
+ .long 0x3f7b14be
+ .long 0x32ff75cb
+ .long 0x3e800000
+ .long 0x3d3c4289
+ .long 0x3f7c3b28
+ .long 0xb231d68b
+ .long 0x3e000000
+ .long 0x3cb2041c
+ .long 0x3f7d3aac
+ .long 0xb0f75ae9
+ .long 0x3e000000
+ .long 0xbb29b1a9
+ .long 0x3f7e1324
+ .long 0xb2f1e603
+ .long 0x3e000000
+ .long 0xbcdd0b28
+ .long 0x3f7ec46d
+ .long 0x31f44949
+ .long 0x3e000000
+ .long 0x3c354825
+ .long 0x3f7f4e6d
+ .long 0x32d01884
+ .long 0x3d800000
+ .long 0xbc5c1342
+ .long 0x3f7fb10f
+ .long 0x31de5b5f
+ .long 0x3d800000
+ .long 0xbbdbd541
+ .long 0x3f7fec43
+ .long 0x3084cd0d
+ .long 0x3d000000
+ .long 0x00000000
+ .long 0x3f800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x3bdbd541
+ .long 0x3f7fec43
+ .long 0x3084cd0d
+ .long 0xbd000000
+ .long 0x3c5c1342
+ .long 0x3f7fb10f
+ .long 0x31de5b5f
+ .long 0xbd800000
+ .long 0xbc354825
+ .long 0x3f7f4e6d
+ .long 0x32d01884
+ .long 0xbd800000
+ .long 0x3cdd0b28
+ .long 0x3f7ec46d
+ .long 0x31f44949
+ .long 0xbe000000
+ .long 0x3b29b1a9
+ .long 0x3f7e1324
+ .long 0xb2f1e603
+ .long 0xbe000000
+ .long 0xbcb2041c
+ .long 0x3f7d3aac
+ .long 0xb0f75ae9
+ .long 0xbe000000
+ .long 0xbd3c4289
+ .long 0x3f7c3b28
+ .long 0xb231d68b
+ .long 0xbe000000
+ .long 0x3d60e8f8
+ .long 0x3f7b14be
+ .long 0x32ff75cb
+ .long 0xbe800000
+ .long 0x3cfd1f65
+ .long 0x3f79c79d
+ .long 0x32c64e59
+ .long 0xbe800000
+ .long 0x3be60685
+ .long 0x3f7853f8
+ .long 0xb20db9e5
+ .long 0xbe800000
+ .long 0xbc88e931
+ .long 0x3f76ba07
+ .long 0x326d092c
+ .long 0xbe800000
+ .long 0xbd25018c
+ .long 0x3f74fa0b
+ .long 0xb2939d22
+ .long 0xbe800000
+ .long 0xbd826b93
+ .long 0x3f731447
+ .long 0x32c48e11
+ .long 0xbe800000
+ .long 0xbdb1f34f
+ .long 0x3f710908
+ .long 0x321ed0dd
+ .long 0xbe800000
+ .long 0x3e0f77ad
+ .long 0x3f6ed89e
+ .long 0xb29333dc
+ .long 0xbf000000
+ .long 0x3df043ab
+ .long 0x3f6c835e
+ .long 0x32f328d4
+ .long 0xbf000000
+ .long 0x3dc210d8
+ .long 0x3f6a09a7
+ .long 0xb2eb236c
+ .long 0xbf000000
+ .long 0x3d945dff
+ .long 0x3f676bd8
+ .long 0xb2bc3389
+ .long 0xbf000000
+ .long 0x3d4e645a
+ .long 0x3f64aa59
+ .long 0x311a08fa
+ .long 0xbf000000
+ .long 0x3cea5164
+ .long 0x3f61c598
+ .long 0xb2e7f425
+ .long 0xbf000000
+ .long 0x3be8b648
+ .long 0x3f5ebe05
+ .long 0x32c6f953
+ .long 0xbf000000
+ .long 0xbc670f32
+ .long 0x3f5b941a
+ .long 0x32232dc8
+ .long 0xbf000000
+ .long 0xbd0f59aa
+ .long 0x3f584853
+ .long 0xb27d5fc0
+ .long 0xbf000000
+ .long 0xbd639d9d
+ .long 0x3f54db31
+ .long 0x3290ea1a
+ .long 0xbf000000
+ .long 0xbd9b4153
+ .long 0x3f514d3d
+ .long 0x300c4f04
+ .long 0xbf000000
+ .long 0xbdc3fdff
+ .long 0x3f4d9f02
+ .long 0x327e70e8
+ .long 0xbf000000
+ .long 0xbdebfe8a
+ .long 0x3f49d112
+ .long 0x32992640
+ .long 0xbf000000
+ .long 0xbe099e65
+ .long 0x3f45e403
+ .long 0x32b15174
+ .long 0xbf000000
+ .long 0xbe1cd957
+ .long 0x3f41d870
+ .long 0x32bff977
+ .long 0xbf000000
+ .long 0xbe2fad27
+ .long 0x3f3daef9
+ .long 0x319aabec
+ .long 0xbf000000
+ .long 0xbe4216eb
+ .long 0x3f396842
+ .long 0xb2810007
+ .long 0xbf000000
+ .long 0x3e95f61a
+ .long 0x3f3504f3
+ .long 0x324fe77a
+ .long 0xbf800000
+ .long 0x3e8d2f7d
+ .long 0x3f3085bb
+ .long 0xb2ae2d32
+ .long 0xbf800000
+ .long 0x3e84a20e
+ .long 0x3f2beb4a
+ .long 0xb2b73136
+ .long 0xbf800000
+ .long 0x3e789e3f
+ .long 0x3f273656
+ .long 0xb2038343
+ .long 0xbf800000
+ .long 0x3e686ff3
+ .long 0x3f226799
+ .long 0x322123bb
+ .long 0xbf800000
+ .long 0x3e58bbb7
+ .long 0x3f1d7fd1
+ .long 0x3292050c
+ .long 0xbf800000
+ .long 0x3e4983f7
+ .long 0x3f187fc0
+ .long 0xb1c7a3f3
+ .long 0xbf800000
+ .long 0x3e3acb0c
+ .long 0x3f13682a
+ .long 0x32cdd12e
+ .long 0xbf800000
+ .long 0x3e2c933b
+ .long 0x3f0e39da
+ .long 0xb24a32e7
+ .long 0xbf800000
+ .long 0x3e1edeb5
+ .long 0x3f08f59b
+ .long 0xb2be4b4e
+ .long 0xbf800000
+ .long 0x3e11af97
+ .long 0x3f039c3d
+ .long 0xb25ba002
+ .long 0xbf800000
+ .long 0x3e0507ea
+ .long 0x3efc5d27
+ .long 0xb180eca9
+ .long 0xbf800000
+ .long 0x3df1d344
+ .long 0x3ef15aea
+ .long 0xb1ff2139
+ .long 0xbf800000
+ .long 0x3ddaad38
+ .long 0x3ee63375
+ .long 0xb1d9c774
+ .long 0xbf800000
+ .long 0x3dc4a143
+ .long 0x3edae880
+ .long 0x321e15cc
+ .long 0xbf800000
+ .long 0x3dafb2cc
+ .long 0x3ecf7bca
+ .long 0x316a3b63
+ .long 0xbf800000
+ .long 0x3d9be50c
+ .long 0x3ec3ef15
+ .long 0x31d5d52c
+ .long 0xbf800000
+ .long 0x3d893b12
+ .long 0x3eb8442a
+ .long 0xb2705ba6
+ .long 0xbf800000
+ .long 0x3d6f6f7e
+ .long 0x3eac7cd4
+ .long 0xb2254e02
+ .long 0xbf800000
+ .long 0x3d4ebb8a
+ .long 0x3ea09ae5
+ .long 0xb23e89a0
+ .long 0xbf800000
+ .long 0x3d305f55
+ .long 0x3e94a031
+ .long 0x326d59f0
+ .long 0xbf800000
+ .long 0x3d145f8c
+ .long 0x3e888e93
+ .long 0x312c7d9e
+ .long 0xbf800000
+ .long 0x3cf58104
+ .long 0x3e78cfcc
+ .long 0xb11bd41d
+ .long 0xbf800000
+ .long 0x3cc70c54
+ .long 0x3e605c13
+ .long 0x31a7e4f6
+ .long 0xbf800000
+ .long 0x3c9d6830
+ .long 0x3e47c5c2
+ .long 0xb0e5967d
+ .long 0xbf800000
+ .long 0x3c71360b
+ .long 0x3e2f10a2
+ .long 0x311167f9
+ .long 0xbf800000
+ .long 0x3c315502
+ .long 0x3e164083
+ .long 0x31e8e614
+ .long 0xbf800000
+ .long 0x3bf66e3c
+ .long 0x3dfab273
+ .long 0xb11568cf
+ .long 0xbf800000
+ .long 0x3b9dc971
+ .long 0x3dc8bd36
+ .long 0xb07592f5
+ .long 0xbf800000
+ .long 0x3b319298
+ .long 0x3d96a905
+ .long 0xb1531e61
+ .long 0xbf800000
+ .long 0x3a9de1c8
+ .long 0x3d48fb30
+ .long 0xb0ef227f
+ .long 0xbf800000
+ .long 0x399de7df
+ .long 0x3cc90ab0
+ .long 0xb005c998
+ .long 0xbf800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0xbf800000
+ .long 0x399de7df
+ .long 0xbcc90ab0
+ .long 0x3005c998
+ .long 0xbf800000
+ .long 0x3a9de1c8
+ .long 0xbd48fb30
+ .long 0x30ef227f
+ .long 0xbf800000
+ .long 0x3b319298
+ .long 0xbd96a905
+ .long 0x31531e61
+ .long 0xbf800000
+ .long 0x3b9dc971
+ .long 0xbdc8bd36
+ .long 0x307592f5
+ .long 0xbf800000
+ .long 0x3bf66e3c
+ .long 0xbdfab273
+ .long 0x311568cf
+ .long 0xbf800000
+ .long 0x3c315502
+ .long 0xbe164083
+ .long 0xb1e8e614
+ .long 0xbf800000
+ .long 0x3c71360b
+ .long 0xbe2f10a2
+ .long 0xb11167f9
+ .long 0xbf800000
+ .long 0x3c9d6830
+ .long 0xbe47c5c2
+ .long 0x30e5967d
+ .long 0xbf800000
+ .long 0x3cc70c54
+ .long 0xbe605c13
+ .long 0xb1a7e4f6
+ .long 0xbf800000
+ .long 0x3cf58104
+ .long 0xbe78cfcc
+ .long 0x311bd41d
+ .long 0xbf800000
+ .long 0x3d145f8c
+ .long 0xbe888e93
+ .long 0xb12c7d9e
+ .long 0xbf800000
+ .long 0x3d305f55
+ .long 0xbe94a031
+ .long 0xb26d59f0
+ .long 0xbf800000
+ .long 0x3d4ebb8a
+ .long 0xbea09ae5
+ .long 0x323e89a0
+ .long 0xbf800000
+ .long 0x3d6f6f7e
+ .long 0xbeac7cd4
+ .long 0x32254e02
+ .long 0xbf800000
+ .long 0x3d893b12
+ .long 0xbeb8442a
+ .long 0x32705ba6
+ .long 0xbf800000
+ .long 0x3d9be50c
+ .long 0xbec3ef15
+ .long 0xb1d5d52c
+ .long 0xbf800000
+ .long 0x3dafb2cc
+ .long 0xbecf7bca
+ .long 0xb16a3b63
+ .long 0xbf800000
+ .long 0x3dc4a143
+ .long 0xbedae880
+ .long 0xb21e15cc
+ .long 0xbf800000
+ .long 0x3ddaad38
+ .long 0xbee63375
+ .long 0x31d9c774
+ .long 0xbf800000
+ .long 0x3df1d344
+ .long 0xbef15aea
+ .long 0x31ff2139
+ .long 0xbf800000
+ .long 0x3e0507ea
+ .long 0xbefc5d27
+ .long 0x3180eca9
+ .long 0xbf800000
+ .long 0x3e11af97
+ .long 0xbf039c3d
+ .long 0x325ba002
+ .long 0xbf800000
+ .long 0x3e1edeb5
+ .long 0xbf08f59b
+ .long 0x32be4b4e
+ .long 0xbf800000
+ .long 0x3e2c933b
+ .long 0xbf0e39da
+ .long 0x324a32e7
+ .long 0xbf800000
+ .long 0x3e3acb0c
+ .long 0xbf13682a
+ .long 0xb2cdd12e
+ .long 0xbf800000
+ .long 0x3e4983f7
+ .long 0xbf187fc0
+ .long 0x31c7a3f3
+ .long 0xbf800000
+ .long 0x3e58bbb7
+ .long 0xbf1d7fd1
+ .long 0xb292050c
+ .long 0xbf800000
+ .long 0x3e686ff3
+ .long 0xbf226799
+ .long 0xb22123bb
+ .long 0xbf800000
+ .long 0x3e789e3f
+ .long 0xbf273656
+ .long 0x32038343
+ .long 0xbf800000
+ .long 0x3e84a20e
+ .long 0xbf2beb4a
+ .long 0x32b73136
+ .long 0xbf800000
+ .long 0x3e8d2f7d
+ .long 0xbf3085bb
+ .long 0x32ae2d32
+ .long 0xbf800000
+ .long 0x3e95f61a
+ .long 0xbf3504f3
+ .long 0xb24fe77a
+ .long 0xbf800000
+ .long 0xbe4216eb
+ .long 0xbf396842
+ .long 0x32810007
+ .long 0xbf000000
+ .long 0xbe2fad27
+ .long 0xbf3daef9
+ .long 0xb19aabec
+ .long 0xbf000000
+ .long 0xbe1cd957
+ .long 0xbf41d870
+ .long 0xb2bff977
+ .long 0xbf000000
+ .long 0xbe099e65
+ .long 0xbf45e403
+ .long 0xb2b15174
+ .long 0xbf000000
+ .long 0xbdebfe8a
+ .long 0xbf49d112
+ .long 0xb2992640
+ .long 0xbf000000
+ .long 0xbdc3fdff
+ .long 0xbf4d9f02
+ .long 0xb27e70e8
+ .long 0xbf000000
+ .long 0xbd9b4153
+ .long 0xbf514d3d
+ .long 0xb00c4f04
+ .long 0xbf000000
+ .long 0xbd639d9d
+ .long 0xbf54db31
+ .long 0xb290ea1a
+ .long 0xbf000000
+ .long 0xbd0f59aa
+ .long 0xbf584853
+ .long 0x327d5fc0
+ .long 0xbf000000
+ .long 0xbc670f32
+ .long 0xbf5b941a
+ .long 0xb2232dc8
+ .long 0xbf000000
+ .long 0x3be8b648
+ .long 0xbf5ebe05
+ .long 0xb2c6f953
+ .long 0xbf000000
+ .long 0x3cea5164
+ .long 0xbf61c598
+ .long 0x32e7f425
+ .long 0xbf000000
+ .long 0x3d4e645a
+ .long 0xbf64aa59
+ .long 0xb11a08fa
+ .long 0xbf000000
+ .long 0x3d945dff
+ .long 0xbf676bd8
+ .long 0x32bc3389
+ .long 0xbf000000
+ .long 0x3dc210d8
+ .long 0xbf6a09a7
+ .long 0x32eb236c
+ .long 0xbf000000
+ .long 0x3df043ab
+ .long 0xbf6c835e
+ .long 0xb2f328d4
+ .long 0xbf000000
+ .long 0x3e0f77ad
+ .long 0xbf6ed89e
+ .long 0x329333dc
+ .long 0xbf000000
+ .long 0xbdb1f34f
+ .long 0xbf710908
+ .long 0xb21ed0dd
+ .long 0xbe800000
+ .long 0xbd826b93
+ .long 0xbf731447
+ .long 0xb2c48e11
+ .long 0xbe800000
+ .long 0xbd25018c
+ .long 0xbf74fa0b
+ .long 0x32939d22
+ .long 0xbe800000
+ .long 0xbc88e931
+ .long 0xbf76ba07
+ .long 0xb26d092c
+ .long 0xbe800000
+ .long 0x3be60685
+ .long 0xbf7853f8
+ .long 0x320db9e5
+ .long 0xbe800000
+ .long 0x3cfd1f65
+ .long 0xbf79c79d
+ .long 0xb2c64e59
+ .long 0xbe800000
+ .long 0x3d60e8f8
+ .long 0xbf7b14be
+ .long 0xb2ff75cb
+ .long 0xbe800000
+ .long 0xbd3c4289
+ .long 0xbf7c3b28
+ .long 0x3231d68b
+ .long 0xbe000000
+ .long 0xbcb2041c
+ .long 0xbf7d3aac
+ .long 0x30f75ae9
+ .long 0xbe000000
+ .long 0x3b29b1a9
+ .long 0xbf7e1324
+ .long 0x32f1e603
+ .long 0xbe000000
+ .long 0x3cdd0b28
+ .long 0xbf7ec46d
+ .long 0xb1f44949
+ .long 0xbe000000
+ .long 0xbc354825
+ .long 0xbf7f4e6d
+ .long 0xb2d01884
+ .long 0xbd800000
+ .long 0x3c5c1342
+ .long 0xbf7fb10f
+ .long 0xb1de5b5f
+ .long 0xbd800000
+ .long 0x3bdbd541
+ .long 0xbf7fec43
+ .long 0xb084cd0d
+ .long 0xbd000000
+ .long 0x00000000
+ .long 0xbf800000
+ .long 0x00000000
+ .long 0x00000000
+ .long 0xbbdbd541
+ .long 0xbf7fec43
+ .long 0xb084cd0d
+ .long 0x3d000000
+ .long 0xbc5c1342
+ .long 0xbf7fb10f
+ .long 0xb1de5b5f
+ .long 0x3d800000
+ .long 0x3c354825
+ .long 0xbf7f4e6d
+ .long 0xb2d01884
+ .long 0x3d800000
+ .long 0xbcdd0b28
+ .long 0xbf7ec46d
+ .long 0xb1f44949
+ .long 0x3e000000
+ .long 0xbb29b1a9
+ .long 0xbf7e1324
+ .long 0x32f1e603
+ .long 0x3e000000
+ .long 0x3cb2041c
+ .long 0xbf7d3aac
+ .long 0x30f75ae9
+ .long 0x3e000000
+ .long 0x3d3c4289
+ .long 0xbf7c3b28
+ .long 0x3231d68b
+ .long 0x3e000000
+ .long 0xbd60e8f8
+ .long 0xbf7b14be
+ .long 0xb2ff75cb
+ .long 0x3e800000
+ .long 0xbcfd1f65
+ .long 0xbf79c79d
+ .long 0xb2c64e59
+ .long 0x3e800000
+ .long 0xbbe60685
+ .long 0xbf7853f8
+ .long 0x320db9e5
+ .long 0x3e800000
+ .long 0x3c88e931
+ .long 0xbf76ba07
+ .long 0xb26d092c
+ .long 0x3e800000
+ .long 0x3d25018c
+ .long 0xbf74fa0b
+ .long 0x32939d22
+ .long 0x3e800000
+ .long 0x3d826b93
+ .long 0xbf731447
+ .long 0xb2c48e11
+ .long 0x3e800000
+ .long 0x3db1f34f
+ .long 0xbf710908
+ .long 0xb21ed0dd
+ .long 0x3e800000
+ .long 0xbe0f77ad
+ .long 0xbf6ed89e
+ .long 0x329333dc
+ .long 0x3f000000
+ .long 0xbdf043ab
+ .long 0xbf6c835e
+ .long 0xb2f328d4
+ .long 0x3f000000
+ .long 0xbdc210d8
+ .long 0xbf6a09a7
+ .long 0x32eb236c
+ .long 0x3f000000
+ .long 0xbd945dff
+ .long 0xbf676bd8
+ .long 0x32bc3389
+ .long 0x3f000000
+ .long 0xbd4e645a
+ .long 0xbf64aa59
+ .long 0xb11a08fa
+ .long 0x3f000000
+ .long 0xbcea5164
+ .long 0xbf61c598
+ .long 0x32e7f425
+ .long 0x3f000000
+ .long 0xbbe8b648
+ .long 0xbf5ebe05
+ .long 0xb2c6f953
+ .long 0x3f000000
+ .long 0x3c670f32
+ .long 0xbf5b941a
+ .long 0xb2232dc8
+ .long 0x3f000000
+ .long 0x3d0f59aa
+ .long 0xbf584853
+ .long 0x327d5fc0
+ .long 0x3f000000
+ .long 0x3d639d9d
+ .long 0xbf54db31
+ .long 0xb290ea1a
+ .long 0x3f000000
+ .long 0x3d9b4153
+ .long 0xbf514d3d
+ .long 0xb00c4f04
+ .long 0x3f000000
+ .long 0x3dc3fdff
+ .long 0xbf4d9f02
+ .long 0xb27e70e8
+ .long 0x3f000000
+ .long 0x3debfe8a
+ .long 0xbf49d112
+ .long 0xb2992640
+ .long 0x3f000000
+ .long 0x3e099e65
+ .long 0xbf45e403
+ .long 0xb2b15174
+ .long 0x3f000000
+ .long 0x3e1cd957
+ .long 0xbf41d870
+ .long 0xb2bff977
+ .long 0x3f000000
+ .long 0x3e2fad27
+ .long 0xbf3daef9
+ .long 0xb19aabec
+ .long 0x3f000000
+ .long 0x3e4216eb
+ .long 0xbf396842
+ .long 0x32810007
+ .long 0x3f000000
+ .long 0xbe95f61a
+ .long 0xbf3504f3
+ .long 0xb24fe77a
+ .long 0x3f800000
+ .long 0xbe8d2f7d
+ .long 0xbf3085bb
+ .long 0x32ae2d32
+ .long 0x3f800000
+ .long 0xbe84a20e
+ .long 0xbf2beb4a
+ .long 0x32b73136
+ .long 0x3f800000
+ .long 0xbe789e3f
+ .long 0xbf273656
+ .long 0x32038343
+ .long 0x3f800000
+ .long 0xbe686ff3
+ .long 0xbf226799
+ .long 0xb22123bb
+ .long 0x3f800000
+ .long 0xbe58bbb7
+ .long 0xbf1d7fd1
+ .long 0xb292050c
+ .long 0x3f800000
+ .long 0xbe4983f7
+ .long 0xbf187fc0
+ .long 0x31c7a3f3
+ .long 0x3f800000
+ .long 0xbe3acb0c
+ .long 0xbf13682a
+ .long 0xb2cdd12e
+ .long 0x3f800000
+ .long 0xbe2c933b
+ .long 0xbf0e39da
+ .long 0x324a32e7
+ .long 0x3f800000
+ .long 0xbe1edeb5
+ .long 0xbf08f59b
+ .long 0x32be4b4e
+ .long 0x3f800000
+ .long 0xbe11af97
+ .long 0xbf039c3d
+ .long 0x325ba002
+ .long 0x3f800000
+ .long 0xbe0507ea
+ .long 0xbefc5d27
+ .long 0x3180eca9
+ .long 0x3f800000
+ .long 0xbdf1d344
+ .long 0xbef15aea
+ .long 0x31ff2139
+ .long 0x3f800000
+ .long 0xbddaad38
+ .long 0xbee63375
+ .long 0x31d9c774
+ .long 0x3f800000
+ .long 0xbdc4a143
+ .long 0xbedae880
+ .long 0xb21e15cc
+ .long 0x3f800000
+ .long 0xbdafb2cc
+ .long 0xbecf7bca
+ .long 0xb16a3b63
+ .long 0x3f800000
+ .long 0xbd9be50c
+ .long 0xbec3ef15
+ .long 0xb1d5d52c
+ .long 0x3f800000
+ .long 0xbd893b12
+ .long 0xbeb8442a
+ .long 0x32705ba6
+ .long 0x3f800000
+ .long 0xbd6f6f7e
+ .long 0xbeac7cd4
+ .long 0x32254e02
+ .long 0x3f800000
+ .long 0xbd4ebb8a
+ .long 0xbea09ae5
+ .long 0x323e89a0
+ .long 0x3f800000
+ .long 0xbd305f55
+ .long 0xbe94a031
+ .long 0xb26d59f0
+ .long 0x3f800000
+ .long 0xbd145f8c
+ .long 0xbe888e93
+ .long 0xb12c7d9e
+ .long 0x3f800000
+ .long 0xbcf58104
+ .long 0xbe78cfcc
+ .long 0x311bd41d
+ .long 0x3f800000
+ .long 0xbcc70c54
+ .long 0xbe605c13
+ .long 0xb1a7e4f6
+ .long 0x3f800000
+ .long 0xbc9d6830
+ .long 0xbe47c5c2
+ .long 0x30e5967d
+ .long 0x3f800000
+ .long 0xbc71360b
+ .long 0xbe2f10a2
+ .long 0xb11167f9
+ .long 0x3f800000
+ .long 0xbc315502
+ .long 0xbe164083
+ .long 0xb1e8e614
+ .long 0x3f800000
+ .long 0xbbf66e3c
+ .long 0xbdfab273
+ .long 0x311568cf
+ .long 0x3f800000
+ .long 0xbb9dc971
+ .long 0xbdc8bd36
+ .long 0x307592f5
+ .long 0x3f800000
+ .long 0xbb319298
+ .long 0xbd96a905
+ .long 0x31531e61
+ .long 0x3f800000
+ .long 0xba9de1c8
+ .long 0xbd48fb30
+ .long 0x30ef227f
+ .long 0x3f800000
+ .long 0xb99de7df
+ .long 0xbcc90ab0
+ .long 0x3005c998
+ .long 0x3f800000
+
+/* General purpose constants:
+ absolute value mask */
+float_vector __sAbsMask 0x7fffffff
+
+/* threshold for out-of-range values */
+float_vector __sRangeReductionVal 0x461c4000
+
+/* +INF */
+float_vector __sRangeVal 0x7f800000
+
+/* High Accuracy version polynomial coefficients:
+ S1 = -1.66666666664728165763e-01 */
+float_vector __sS1 0xbe2aaaab
+
+/* S2 = 8.33329173045453069014e-03 */
+float_vector __sS2 0x3c08885c
+
+/* C1 = -5.00000000000000000000e-01 */
+float_vector __sC1 0xbf000000
+
+/* C2 = 4.16638942914469202550e-02 */
+float_vector __sC2 0x3d2aaa7c
+
+/* high accuracy table index mask */
+float_vector __iIndexMask 0x000000ff
+
+/* 2^(k-1) */
+float_vector __i2pK_1 0x00000040
+
+/* sign field mask */
+float_vector __sSignMask 0x80000000
+
+/* Range reduction PI-based constants:
+ PI high part */
+float_vector __sPI1 0x40490000
+
+/* PI mid part 1 */
+float_vector __sPI2 0x3a7da000
+
+/* PI mid part 2 */
+float_vector __sPI3 0x34222000
+
+/* PI low part */
+float_vector __sPI4 0x2cb4611a
+
+/* Range reduction PI-based constants if FMA available:
+ PI high part (when FMA available) */
+float_vector __sPI1_FMA 0x40490fdb
+
+/* PI mid part (when FMA available) */
+float_vector __sPI2_FMA 0xb3bbbd2e
+
+/* PI low part (when FMA available) */
+float_vector __sPI3_FMA 0xa7772ced
+
+/* Polynomial coefficients: */
+float_vector __sA3 0xbe2aaaa6
+float_vector __sA5 0x3c08876a
+float_vector __sA7 0xb94fb7ff
+float_vector __sA9 0x362edef8
+
+/* Polynomial coefficients (when hardware FMA available) */
+float_vector __sA5_FMA 0x3c088768
+float_vector __sA7_FMA 0xb94fb6cf
+float_vector __sA9_FMA 0x362ec335
+
+/* 1/PI */
+float_vector __sInvPI 0x3ea2f983
+
+/* right-shifter constant */
+float_vector __sRShifter 0x4b400000
+
+/* PI/2 */
+float_vector __sHalfPI 0x3fc90fdb
+
+/* 1/2 */
+float_vector __sOneHalf 0x3f000000
+ .type __svml_ssincos_data,@object
+ .size __svml_ssincos_data,.-__svml_ssincos_data
diff --git a/sysdeps/x86_64/fpu/svml_s_sincosf_data.h b/sysdeps/x86_64/fpu/svml_s_sincosf_data.h
new file mode 100644
index 0000000..4325117
--- /dev/null
+++ b/sysdeps/x86_64/fpu/svml_s_sincosf_data.h
@@ -0,0 +1,61 @@
+/* Offsets for data table for function sincosf.
+ Copyright (C) 2014-2015 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#ifndef S_SINCOSF_DATA_H
+#define S_SINCOSF_DATA_H
+
+#define __dT 0
+#define __sAbsMask 4096
+#define __sRangeReductionVal 4160
+#define __sRangeVal 4224
+#define __sS1 4288
+#define __sS2 4352
+#define __sC1 4416
+#define __sC2 4480
+#define __iIndexMask 4544
+#define __i2pK_1 4608
+#define __sSignMask 4672
+#define __sPI1 4736
+#define __sPI2 4800
+#define __sPI3 4864
+#define __sPI4 4928
+#define __sPI1_FMA 4992
+#define __sPI2_FMA 5056
+#define __sPI3_FMA 5120
+#define __sA3 5184
+#define __sA5 5248
+#define __sA7 5312
+#define __sA9 5376
+#define __sA5_FMA 5440
+#define __sA7_FMA 5504
+#define __sA9_FMA 5568
+#define __sInvPI 5632
+#define __sRShifter 5696
+#define __sHalfPI 5760
+#define __sOneHalf 5824
+
+.macro float_vector offset value
+.if .-__svml_ssincos_data != \offset
+.err
+.endif
+.rept 16
+.long \value
+.endr
+.endm
+
+#endif
diff --git a/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h b/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
index f88e30f..66bb081 100644
--- a/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
+++ b/sysdeps/x86_64/fpu/svml_s_wrapper_impl.h
@@ -76,6 +76,67 @@
ret
.endm
+/* 3 argument SSE2 ISA version as wrapper to scalar. */
+.macro WRAPPER_IMPL_SSE2_fFF callee
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ pushq %rbx
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbx, 0)
+ movq %rdi, %rbp
+ movq %rsi, %rbx
+ subq $40, %rsp
+ cfi_adjust_cfa_offset(40)
+ leaq 24(%rsp), %rsi
+ leaq 28(%rsp), %rdi
+ movaps %xmm0, (%rsp)
+ call \callee@PLT
+ leaq 24(%rsp), %rsi
+ leaq 28(%rsp), %rdi
+ movss 28(%rsp), %xmm0
+ movss %xmm0, 0(%rbp)
+ movaps (%rsp), %xmm1
+ movss 24(%rsp), %xmm0
+ movss %xmm0, (%rbx)
+ movaps %xmm1, %xmm0
+ shufps $85, %xmm1, %xmm0
+ call \callee@PLT
+ movss 28(%rsp), %xmm0
+ leaq 24(%rsp), %rsi
+ movss %xmm0, 4(%rbp)
+ leaq 28(%rsp), %rdi
+ movaps (%rsp), %xmm1
+ movss 24(%rsp), %xmm0
+ movss %xmm0, 4(%rbx)
+ movaps %xmm1, %xmm0
+ unpckhps %xmm1, %xmm0
+ call \callee@PLT
+ movaps (%rsp), %xmm1
+ leaq 24(%rsp), %rsi
+ leaq 28(%rsp), %rdi
+ movss 28(%rsp), %xmm0
+ shufps $255, %xmm1, %xmm1
+ movss %xmm0, 8(%rbp)
+ movss 24(%rsp), %xmm0
+ movss %xmm0, 8(%rbx)
+ movaps %xmm1, %xmm0
+ call \callee@PLT
+ movss 28(%rsp), %xmm0
+ movss %xmm0, 12(%rbp)
+ movss 24(%rsp), %xmm0
+ movss %xmm0, 12(%rbx)
+ addq $40, %rsp
+ cfi_adjust_cfa_offset(-40)
+ popq %rbx
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbx)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+.endm
+
/* AVX/AVX2 ISA version as wrapper to SSE ISA version. */
.macro WRAPPER_IMPL_AVX callee
pushq %rbp
@@ -130,6 +191,52 @@
ret
.endm
+/* 3 argument AVX/AVX2 ISA version as wrapper to SSE ISA version. */
+.macro WRAPPER_IMPL_AVX_fFF callee
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-32, %rsp
+ pushq %r13
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%r13, 0)
+ pushq %r14
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%r14, 0)
+ subq $48, %rsp
+ movq %rsi, %r14
+ vmovaps %ymm0, (%rsp)
+ movq %rdi, %r13
+ vmovaps 16(%rsp), %xmm1
+ vmovaps %xmm1, 32(%rsp)
+ vzeroupper
+ vmovaps (%rsp), %xmm0
+ call HIDDEN_JUMPTARGET(\callee)
+ vmovaps 32(%rsp), %xmm0
+ lea (%rsp), %rdi
+ lea 16(%rsp), %rsi
+ call HIDDEN_JUMPTARGET(\callee)
+ vmovaps (%rsp), %xmm0
+ vmovaps 16(%rsp), %xmm1
+ vmovaps %xmm0, 16(%r13)
+ vmovaps %xmm1, 16(%r14)
+ addq $48, %rsp
+ popq %r14
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%r14)
+ popq %r13
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%r13)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+.endm
+
/* AVX512 ISA version as wrapper to AVX2 ISA version. */
.macro WRAPPER_IMPL_AVX512 callee
pushq %rbp
@@ -147,20 +254,9 @@
.byte 0x29
.byte 0x04
.byte 0x24
-/* Below is encoding for vmovaps (%rsp), %ymm0. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
- .byte 0x04
- .byte 0x24
+ vmovaps (%rsp), %ymm0
call HIDDEN_JUMPTARGET(\callee)
-/* Below is encoding for vmovaps 32(%rsp), %ymm0. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
- .byte 0x44
- .byte 0x24
- .byte 0x20
+ vmovaps 32(%rsp), %ymm0
call HIDDEN_JUMPTARGET(\callee)
movq %rbp, %rsp
cfi_def_cfa_register (%rsp)
@@ -195,38 +291,57 @@
.byte 0x29
.byte 0x4c
.byte 0x24
-/* Below is encoding for vmovaps (%rsp), %ymm0. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
+ vmovaps (%rsp), %ymm0
+ vmovaps 64(%rsp), %ymm1
+ call HIDDEN_JUMPTARGET(\callee)
+ vmovaps 32(%rsp), %ymm0
+ vmovaps 96(%rsp), %ymm1
+ call HIDDEN_JUMPTARGET(\callee)
+ movq %rbp, %rsp
+ cfi_def_cfa_register (%rsp)
+ popq %rbp
+ cfi_adjust_cfa_offset (-8)
+ cfi_restore (%rbp)
+ ret
+.endm
+
+/* 3 argument AVX512 ISA version as wrapper to AVX2 ISA version. */
+.macro WRAPPER_IMPL_AVX512_fFF callee
+ pushq %rbp
+ cfi_adjust_cfa_offset (8)
+ cfi_rel_offset (%rbp, 0)
+ movq %rsp, %rbp
+ cfi_def_cfa_register (%rbp)
+ andq $-64, %rsp
+ pushq %r12
+ pushq %r13
+ subq $176, %rsp
+ movq %rsi, %r13
+/* Below is encoding for vmovaps %zmm0, (%rsp). */
+ .byte 0x62
+ .byte 0xf1
+ .byte 0x7c
+ .byte 0x48
+ .byte 0x29
.byte 0x04
.byte 0x24
-/* Below is encoding for vmovaps 64(%rsp), %ymm1. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
- .byte 0x4c
- .byte 0x24
- .byte 0x40
+ movq %rdi, %r12
+ vmovaps (%rsp), %ymm0
call HIDDEN_JUMPTARGET(\callee)
-/* Below is encoding for vmovaps 32(%rsp), %ymm0. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
- .byte 0x44
- .byte 0x24
- .byte 0x20
-/* Below is encoding for vmovaps 96(%rsp), %ymm1. */
- .byte 0xc5
- .byte 0xfc
- .byte 0x28
- .byte 0x4c
- .byte 0x24
- .byte 0x60
+ vmovaps 32(%rsp), %ymm0
+ lea 64(%rsp), %rdi
+ lea 96(%rsp), %rsi
call HIDDEN_JUMPTARGET(\callee)
+ vmovaps 64(%rsp), %ymm0
+ vmovaps 96(%rsp), %ymm1
+ vmovaps %ymm0, 32(%r12)
+ vmovaps %ymm1, 32(%r13)
+ addq $176, %rsp
+ popq %r13
+ popq %r12
movq %rbp, %rsp
cfi_def_cfa_register (%rsp)
- popq %rbp
+ popq %rbp
cfi_adjust_cfa_offset (-8)
cfi_restore (%rbp)
ret
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
index 00a1074..6cc6008 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c
@@ -24,6 +24,7 @@
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVeN16v_cosf)
VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVeN16v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVeN16vvv_sincosf)
VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVeN16v_logf)
VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVeN16v_expf)
VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVeN16vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen16.c b/sysdeps/x86_64/fpu/test-float-vlen16.c
index 86b8c33..d7f683f 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen16.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen16.c
@@ -20,6 +20,7 @@
#define TEST_VECTOR_cosf 1
#define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
#define TEST_VECTOR_logf 1
#define TEST_VECTOR_expf 1
#define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
index 7d41e46..ae12a10 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c
@@ -24,6 +24,7 @@
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVbN4v_cosf)
VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVbN4v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVbN4vvv_sincosf)
VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVbN4v_logf)
VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVbN4v_expf)
VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVbN4vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen4.c b/sysdeps/x86_64/fpu/test-float-vlen4.c
index 3e74118..e56d642 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen4.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen4.c
@@ -20,6 +20,7 @@
#define TEST_VECTOR_cosf 1
#define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
#define TEST_VECTOR_logf 1
#define TEST_VECTOR_expf 1
#define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
index ed1c893..f0c7d4a 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2-wrappers.c
@@ -27,6 +27,7 @@
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVdN8v_cosf)
VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVdN8v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVdN8vvv_sincosf)
VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVdN8v_logf)
VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVdN8v_expf)
VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVdN8vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
index f0aaec1..0012082 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-avx2.c
@@ -23,6 +23,7 @@
#define TEST_VECTOR_cosf 1
#define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
#define TEST_VECTOR_logf 1
#define TEST_VECTOR_expf 1
#define TEST_VECTOR_powf 1
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
index 37bf702..6b267de 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c
@@ -24,6 +24,7 @@
VECTOR_WRAPPER (WRAPPER_NAME (cosf), _ZGVcN8v_cosf)
VECTOR_WRAPPER (WRAPPER_NAME (sinf), _ZGVcN8v_sinf)
+VECTOR_WRAPPER_fFF (WRAPPER_NAME (sincosf), _ZGVcN8vvv_sincosf)
VECTOR_WRAPPER (WRAPPER_NAME (logf), _ZGVcN8v_logf)
VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVcN8v_expf)
VECTOR_WRAPPER_ff (WRAPPER_NAME (powf), _ZGVcN8vv_powf)
diff --git a/sysdeps/x86_64/fpu/test-float-vlen8.c b/sysdeps/x86_64/fpu/test-float-vlen8.c
index ef2aedc..581cbde 100644
--- a/sysdeps/x86_64/fpu/test-float-vlen8.c
+++ b/sysdeps/x86_64/fpu/test-float-vlen8.c
@@ -20,6 +20,7 @@
#define TEST_VECTOR_cosf 1
#define TEST_VECTOR_sinf 1
+#define TEST_VECTOR_sincosf 1
#define TEST_VECTOR_logf 1
#define TEST_VECTOR_expf 1
#define TEST_VECTOR_powf 1
-----------------------------------------------------------------------
Summary of changes:
ChangeLog | 33 +
NEWS | 4 +-
math/test-float-vlen16.h | 17 +
math/test-float-vlen4.h | 17 +
math/test-float-vlen8.h | 17 +
sysdeps/unix/sysv/linux/x86_64/libmvec.abilist | 4 +
sysdeps/x86/fpu/bits/math-vector.h | 2 +
sysdeps/x86_64/fpu/Makefile | 4 +-
sysdeps/x86_64/fpu/Versions | 1 +
sysdeps/x86_64/fpu/libm-test-ulps | 8 +
sysdeps/x86_64/fpu/multiarch/Makefile | 3 +-
.../x86_64/fpu/multiarch/svml_s_sincosf16_core.S | 39 +
.../fpu/multiarch/svml_s_sincosf16_core_avx512.S | 504 +++++++++
.../x86_64/fpu/multiarch/svml_s_sincosf4_core.S | 38 +
.../fpu/multiarch/svml_s_sincosf4_core_sse4.S | 268 +++++
.../x86_64/fpu/multiarch/svml_s_sincosf8_core.S | 38 +
.../fpu/multiarch/svml_s_sincosf8_core_avx2.S | 241 +++++
sysdeps/x86_64/fpu/svml_s_sincosf16_core.S | 25 +
sysdeps/x86_64/fpu/svml_s_sincosf4_core.S | 30 +
sysdeps/x86_64/fpu/svml_s_sincosf8_core.S | 29 +
sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S | 25 +
sysdeps/x86_64/fpu/svml_s_sincosf_data.S | 1140 ++++++++++++++++++++
sysdeps/x86_64/fpu/svml_s_sincosf_data.h | 61 ++
sysdeps/x86_64/fpu/svml_s_wrapper_impl.h | 193 +++-
sysdeps/x86_64/fpu/test-float-vlen16-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen16.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen4-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen4.c | 1 +
.../x86_64/fpu/test-float-vlen8-avx2-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8-avx2.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8-wrappers.c | 1 +
sysdeps/x86_64/fpu/test-float-vlen8.c | 1 +
32 files changed, 2706 insertions(+), 43 deletions(-)
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf16_core_avx512.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf4_core_sse4.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core.S
create mode 100644 sysdeps/x86_64/fpu/multiarch/svml_s_sincosf8_core_avx2.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf16_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf4_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf8_core.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf8_core_avx.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf_data.S
create mode 100644 sysdeps/x86_64/fpu/svml_s_sincosf_data.h
hooks/post-receive
--
GNU C Library master sources