This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

[PATCH v2] Aarch64: Add simd exp/expf functions

From: Steve Ellcey <sellcey at marvell dot com>
To: "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Date: Wed, 22 May 2019 16:54:48 +0000
Subject: [PATCH v2] Aarch64: Add simd exp/expf functions

Here is an updated version of my patch to add libmvec and vector exp
functions to Aarch64.  GCC 9.1 has now been released so the build
no longer depends on an unreleased compiler.  I have not added any
assembly trampolines to allow older compilers to be used.

The SIMD ABI that this patch uses is supported by GCC 9.1 and is defined at:

https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi

If you build with GCC 9.1 (or any compiler that supports the aarch64_vector_pcs
attribute) you will get libmvec by default, otherwise you will not.  If you
try to build libmvec using a compiler with out aarch64_vector_pcs support the
configure will fail.

There was a question of whether building libmvec should be optional or not,
I don't have a strong opinion on that but would be interested in what others
think.  I could change this to require aarch64_vector_pcs attribute support
in all cases and always build libmvec if that is what we want.

I added static *_finite function names so that they are not exported.  If
Wilco's patch to remove the *_finite names entirely is approved I can remove
this part of the patch.

I removed the 'if (aarch64)' conditionals from math-vector-fortran.h
(and fixed my use of the BIG_ENDIAN macros) this means that the vector exp
and expf routines should get used in big-endian and little-endian modes for
Fortran (just like C).  I have not done any big-endian testing at this point
because I have been doing all my testing on a little-endian Aarch64 linux box.

If anyone has ideas on how to do big-endian testing I would be interested.
I am guessing I would have to build an elf target and test with qemu or
something like that but I haven't done a build/test setup like that in quite
a while.

Steve Ellcey
sellcey@marvell.com

2019-05-22  Steve Ellcey  <sellcey@marvell.com>

	* NEWS: Add entry about libmvec support on aarch64.
	* sysdeps/aarch64/configure.ac (build_mathvec): Check for ABI support,
	Build libmvec if support exists.
	* sysdeps/aarch64/configure: Regenerate.
	* sysdeps/aarch64/fpu/Makefile (CFLAGS-libmvec_double_vlen2_exp.c):
	Set flag.
	(CFLAGS-libmvec_float_vlen4_expf.c): Likewise.
	(CFLAGS-libmvec_exp_data.c): Likewise.
	(CFLAGS-libmvec_exp2f_data.c): Likewise.
	(libmvec-support): Add libmvec_double_vlen2_exp,
	libmvec_float_vlen4_expf, libmvec_exp_data, libmvec_exp2f_data,
	libmvec_aliases to list.
	(libmvec-static-only-routines): Add libmvec_aliases to list.
	(libmvec-tests): Add double-vlen2, float-vlen4 to list.
	(double-vlen2-funcs): Add new vector function name.
	(float-vlen4-funcs): Add new vector function name.
	* sysdeps/aarch64/fpu/Versions: New file.
	* sysdeps/aarch64/fpu/bits/math-vector.h: New file.
	* sysdeps/aarch64/fpu/finclude/math-vector-fortran.h: New file.
	* sysdeps/aarch64/fpu/libmvec_aliases.c: New file.
	* sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp2f_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_exp_data.c: New file.
	* sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c: New file.
	* sysdeps/aarch64/fpu/libmvec_util.h: New file.
	* sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c: New file.
	* sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c: New file.
	* sysdeps/aarch64/libm-test-ulps (exp_vlen2): New entry.
	(exp_vlen4): Likewise.
	* sysdeps/unix/sysv/linux/aarch64/libmvec.abilist: New file.

diff --git a/NEWS b/NEWS
index 0e4c57f273..b1845c02e4 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Version 2.30
 
 Major new features:
 
+* Aarch64 now supports libmvec.  Building libmvec on aarch64 requires
+  a compiler that supports the vector function ABI that is defined at
+  https://developer.arm.com/tools-and-software/server-and-hpc/arm-architecture-tools/arm-compiler-for-hpc/vector-function-abi
+
+  GCC 9.1 has support for this ABI.  The current libmvec for aarch64
+  has vector versions of the exp and expf functions.
+
 * Unicode 12.1.0 Support: Character encoding, character type info, and
   transliteration tables are all updated to Unicode 12.1.0, using
   generator scripts contributed by Mike FABIAN (Red Hat).
diff --git a/sysdeps/aarch64/configure.ac b/sysdeps/aarch64/configure.ac
index 7851dd4dac..5c56511deb 100644
--- a/sysdeps/aarch64/configure.ac
+++ b/sysdeps/aarch64/configure.ac
@@ -20,3 +20,27 @@ if test $libc_cv_aarch64_be = yes; then
 else
   LIBC_CONFIG_VAR([default-abi], [lp64])
 fi
+
+AC_CACHE_CHECK([for pcs attribute support],
+               libc_cv_gcc_pcs_attribute, [dnl
+cat > conftest.c <<EOF
+__attribute__((aarch64_vector_pcs)) extern void foo (void);
+EOF
+libc_cv_gcc_pcs_attribute=no
+if ${CC-cc} -c -Wall -Werror conftest.c -o conftest.o 1>&AS_MESSAGE_LOG_FD \
+   2>&AS_MESSAGE_LOG_FD ; then
+  libc_cv_gcc_pcs_attribute=yes
+fi
+rm -f conftest*])
+
+if test x"$build_mathvec" = xyes; then
+  if test $libc_cv_gcc_pcs_attribute = no; then
+    AC_MSG_ERROR([--enable-mathvec requires a gcc that supports the aarch64_vector_pcs attribute])
+  fi
+fi
+
+if test x"$build_mathvec" = xnotset; then
+  if test $libc_cv_gcc_pcs_attribute = yes; then
+    build_mathvec=yes
+  fi
+fi
diff --git a/sysdeps/aarch64/fpu/Makefile b/sysdeps/aarch64/fpu/Makefile
index 4a182bd6d6..c0720484e2 100644
--- a/sysdeps/aarch64/fpu/Makefile
+++ b/sysdeps/aarch64/fpu/Makefile
@@ -12,3 +12,27 @@ CFLAGS-s_fmaxf.c += -ffinite-math-only
 CFLAGS-s_fmin.c += -ffinite-math-only
 CFLAGS-s_fminf.c += -ffinite-math-only
 endif
+
+ifeq ($(subdir),mathvec)
+CFLAGS-libmvec_double_vlen2_exp.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_float_vlen4_expf.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_exp2f_data.c += -march=armv8-a+simd -fno-math-errno
+CFLAGS-libmvec_aliases.c += -march=armv8-a+simd -fno-math-errno
+
+libmvec-support += libmvec_double_vlen2_exp
+libmvec-support += libmvec_float_vlen4_expf
+libmvec-support += libmvec_exp_data
+libmvec-support += libmvec_exp2f_data
+libmvec-support += libmvec_aliases
+
+libmvec-static-only-routines += limvec_aliases
+endif
+
+ifeq ($(subdir),math)
+ifeq ($(build-mathvec),yes)
+libmvec-tests += double-vlen2 float-vlen4
+double-vlen2-funcs = exp
+float-vlen4-funcs = exp
+endif
+endif
diff --git a/sysdeps/aarch64/fpu/Versions b/sysdeps/aarch64/fpu/Versions
index e69de29bb2..da36f3c495 100644
--- a/sysdeps/aarch64/fpu/Versions
+++ b/sysdeps/aarch64/fpu/Versions
@@ -0,0 +1,5 @@
+libmvec {
+  GLIBC_2.30 {
+    _ZGVnN2v_exp; _ZGVnN4v_expf;
+  }
+}
diff --git a/sysdeps/aarch64/fpu/bits/math-vector.h b/sysdeps/aarch64/fpu/bits/math-vector.h
index e69de29bb2..4c3415987a 100644
--- a/sysdeps/aarch64/fpu/bits/math-vector.h
+++ b/sysdeps/aarch64/fpu/bits/math-vector.h
@@ -0,0 +1,43 @@
+/* Platform-specific SIMD declarations of math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#ifndef _MATH_H
+# error "Never include <bits/math-vector.h> directly;\
+ include <math.h> instead."
+#endif
+
+/* Get default empty definitions for simd declarations.  */
+#include <bits/libm-simd-decl-stubs.h>
+
+#if defined __FAST_MATH__
+# if defined _OPENMP && _OPENMP >= 201307
+/* OpenMP case.  */
+#  define __DECL_SIMD_AARCH64 _Pragma ("omp declare simd notinbranch")
+# elif __GNUC_PREREQ (6,0)
+/* W/o OpenMP use GCC 6.* __attribute__ ((__simd__)).  */
+#  define __DECL_SIMD_AARCH64 __attribute__ ((__simd__ ("notinbranch")))
+# endif
+
+# ifdef __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_exp
+#  define __DECL_SIMD_exp __DECL_SIMD_AARCH64
+#  undef __DECL_SIMD_expf
+#  define __DECL_SIMD_expf __DECL_SIMD_AARCH64
+
+# endif
+#endif
diff --git a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
index e69de29bb2..293983eb2c 100644
--- a/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
+++ b/sysdeps/aarch64/fpu/finclude/math-vector-fortran.h
@@ -0,0 +1,20 @@
+! Platform-specific declarations of SIMD math functions for Fortran. -*- f90 -*-
+!   Copyright (C) 2019 Free Software Foundation, Inc.
+!   This file is part of the GNU C Library.
+!
+!   The GNU C Library is free software; you can redistribute it and/or
+!   modify it under the terms of the GNU Lesser General Public
+!   License as published by the Free Software Foundation; either
+!   version 2.1 of the License, or (at your option) any later version.
+!
+!   The GNU C Library is distributed in the hope that it will be useful,
+!   but WITHOUT ANY WARRANTY; without even the implied warranty of
+!   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+!   Lesser General Public License for more details.
+!
+!   You should have received a copy of the GNU Lesser General Public
+!   License along with the GNU C Library; if not, see
+!   <http://www.gnu.org/licenses/>.
+
+!GCC$ builtin (exp) attributes simd (notinbranch)
+!GCC$ builtin (expf) attributes simd (notinbranch)
diff --git a/sysdeps/aarch64/fpu/libmvec_aliases.c b/sysdeps/aarch64/fpu/libmvec_aliases.c
index e69de29bb2..bc3f9b8118 100644
--- a/sysdeps/aarch64/fpu/libmvec_aliases.c
+++ b/sysdeps/aarch64/fpu/libmvec_aliases.c
@@ -0,0 +1,40 @@
+/* These aliases added as workaround to exclude unnecessary symbol
+   aliases in libmvec.so while compiler creates the vector names
+   based on scalar asm name.  Corresponding discussion is at
+   <https://gcc.gnu.org/ml/gcc/2015-06/msg00173.html>.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+
+extern __attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v_exp (float64x2_t x);
+
+__attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v___exp_finite (float64x2_t x)
+{
+  return _ZGVnN2v_exp (x);
+}
+
+extern __attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v_expf (float32x4_t x);
+
+__attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v___expf_finite (float32x4_t x)
+{
+  return _ZGVnN4v_expf (x);
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
index e69de29bb2..ce618c8859 100644
--- a/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
+++ b/sysdeps/aarch64/fpu/libmvec_double_vlen2_exp.c
@@ -0,0 +1,94 @@
+/* Double-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/dbl-64/e_exp.c.  */
+
+#include <math.h>
+#include <float.h>
+#include <stdint.h>
+#include <stdlib.h>
+#include <ieee754.h>
+#include <math-narrow-eval.h>
+#include "math_config.h"
+#include "libmvec_util.h"
+
+#define N (1 << EXP_TABLE_BITS)
+#define InvLn2N __exp_data.invln2N
+#define NegLn2hiN __exp_data.negln2hiN
+#define NegLn2loN __exp_data.negln2loN
+#define Shift __exp_data.shift
+#define T __exp_data.tab
+#define C2 __exp_data.poly[5 - EXP_POLY_ORDER]
+#define C3 __exp_data.poly[6 - EXP_POLY_ORDER]
+#define C4 __exp_data.poly[7 - EXP_POLY_ORDER]
+#define C5 __exp_data.poly[8 - EXP_POLY_ORDER]
+
+#define LIMIT 700.0
+
+/* Do not inline this call.   That way _ZGVnN2v_exp has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN2v_exp has to do.  */
+
+__attribute__((aarch64_vector_pcs, noinline)) static float64x2_t
+__scalar_exp (float64x2_t x)
+{
+  return (float64x2_t) { exp(x[0]), exp(x[1]) };
+}
+
+__attribute__((aarch64_vector_pcs)) float64x2_t
+_ZGVnN2v_exp (float64x2_t x)
+{
+  double h, z_0, z_1;
+  float64x2_t g, scale_v, tail_v, tmp_v, r_v, r2_v, kd_v;
+  float64x2_t NegLn2hiN_v, NegLn2loN_v, C2_v, C3_v, C4_v, C5_v;
+  uint64_t ki_0, ki_1, idx_0, idx_1;
+  uint64_t top_0, top_1, sbits_0, sbits_1;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = vabsq_f64 (x);
+  h = vmaxnmvq_f64 (g);
+  if (__glibc_unlikely (!(h < LIMIT)))
+    return __scalar_exp (x);
+
+  z_0 = InvLn2N * x[0];
+  z_1 = InvLn2N * x[1];
+  ki_0 = converttoint (z_0);
+  ki_1 = converttoint (z_1);
+
+  idx_0 = 2 * (ki_0 % N);
+  idx_1 = 2 * (ki_1 % N);
+  top_0 = ki_0 << (52 - EXP_TABLE_BITS);
+  top_1 = ki_1 << (52 - EXP_TABLE_BITS);
+  sbits_0 = T[idx_0 + 1] + top_0;
+  sbits_1 = T[idx_1 + 1] + top_1;
+
+  kd_v = (float64x2_t) { roundtoint (z_0), roundtoint (z_1) };
+  scale_v = (float64x2_t) { asdouble (sbits_0), asdouble (sbits_1) };
+  tail_v = (float64x2_t) { asdouble (T[idx_0]), asdouble (T[idx_1]) };
+  NegLn2hiN_v = (float64x2_t) { NegLn2hiN, NegLn2hiN };
+  NegLn2loN_v = (float64x2_t) { NegLn2loN, NegLn2loN };
+  C2_v = (float64x2_t) { C2, C2 };
+  C3_v = (float64x2_t) { C3, C3 };
+  C4_v = (float64x2_t) { C4, C4 };
+  C5_v = (float64x2_t) { C5, C5 };
+
+  r_v = x + kd_v * NegLn2hiN_v + kd_v * NegLn2loN_v;
+  r2_v = r_v * r_v;
+  tmp_v = tail_v + r_v + r2_v * (C2_v + r_v * C3_v) + r2_v * r2_v
+	  * (C4_v + r_v * C5_v);
+  return scale_v + scale_v * tmp_v;
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
index e69de29bb2..d97ce157b0 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp2f_data.c
@@ -0,0 +1,2 @@
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include <sysdeps/ieee754/flt-32/e_exp2f_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_exp_data.c b/sysdeps/aarch64/fpu/libmvec_exp_data.c
index e69de29bb2..a83661b39d 100644
--- a/sysdeps/aarch64/fpu/libmvec_exp_data.c
+++ b/sysdeps/aarch64/fpu/libmvec_exp_data.c
@@ -0,0 +1 @@
+#include <sysdeps/ieee754/dbl-64/e_exp_data.c>
diff --git a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
index e69de29bb2..938c72ddab 100644
--- a/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
+++ b/sysdeps/aarch64/fpu/libmvec_float_vlen4_expf.c
@@ -0,0 +1,114 @@
+/* Single-precision 2 element vector e^x function.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+/* This function is based on sysdeps/ieee754/flt-32/e_expf.c.  */
+
+#include <math.h>
+#include <stdint.h>
+#include <stdio.h>
+#include <sysdeps/ieee754/flt-32/math_config.h>
+#include "libmvec_util.h"
+
+#define N (1 << EXP2F_TABLE_BITS)
+#define LIMIT 80.0
+
+#define InvLn2N __exp2f_data.invln2_scaled
+#define T __exp2f_data.tab
+#define C __exp2f_data.poly_scaled
+#define SHIFT __exp2f_data.shift
+
+/* Do not inline this call.  That way _ZGVnN4v_expf has no calls to non-vector
+   functions.  This reduces the register saves that _ZGVnN4v_expf has to do.  */
+
+__attribute__((aarch64_vector_pcs,noinline)) static float32x4_t
+__scalar_expf (float32x4_t x)
+{
+  return (float32x4_t) { expf(x[0]), expf(x[1]), expf(x[2]), expf(x[3]) };
+}
+
+__attribute__((aarch64_vector_pcs)) float32x4_t
+_ZGVnN4v_expf (float32x4_t x)
+{
+  float32x4_t g, result;
+  float64x2_t xd_0, xd_1, vInvLn2N, z_0, z_1, vkd_0, vkd_1, r_0, r_1;
+  float64x2_t vs_0, vs_1, c0, c1, c2, y_0, y_1, r2_0, r2_1, one;
+  uint64_t ki_0, ki_1, ki_2, ki_3, t_0, t_1, t_2, t_3;
+  double s_0, s_1, s_2, s_3;
+  float f;
+
+  /* If any value is larger than LIMIT, or NAN, call scalar operation.  */
+  g = vabsq_f32 (x);
+  f = vmaxnmvq_f32 (g);
+  if (__glibc_unlikely (!(f < LIMIT)))
+    return __scalar_expf (x);
+  
+  xd_0 = get_lo_and_extend (x);
+  xd_1 = get_hi_and_extend (x);
+
+  vInvLn2N = (float64x2_t) { InvLn2N, InvLn2N };
+  /* x*N/Ln2 = k + r with r in [-1/2, 1/2] and int k.  */
+  z_0 = vInvLn2N * xd_0;
+  z_1 = vInvLn2N * xd_1;
+
+  /* Round and convert z to int, the result is in [-150*N, 128*N] and
+     ideally ties-to-even rule is used, otherwise the magnitude of r
+     can be bigger which gives larger approximation error.  */
+  vkd_0 = vrndaq_f64 (z_0);
+  vkd_1 = vrndaq_f64 (z_1);
+  r_0 = z_0 - vkd_0;
+  r_1 = z_1 - vkd_1;
+
+  ki_0 = (long) vkd_0[0];
+  ki_1 = (long) vkd_0[1];
+  ki_2 = (long) vkd_1[0];
+  ki_3 = (long) vkd_1[1];
+
+  /* exp(x) = 2^(k/N) * 2^(r/N) ~= s * (C0*r^3 + C1*r^2 + C2*r + 1) */
+  t_0 = T[ki_0 % N];
+  t_1 = T[ki_1 % N];
+  t_2 = T[ki_2 % N];
+  t_3 = T[ki_3 % N];
+  t_0 += ki_0 << (52 - EXP2F_TABLE_BITS);
+  t_1 += ki_1 << (52 - EXP2F_TABLE_BITS);
+  t_2 += ki_2 << (52 - EXP2F_TABLE_BITS);
+  t_3 += ki_3 << (52 - EXP2F_TABLE_BITS);
+  s_0 = asdouble (t_0);
+  s_1 = asdouble (t_1);
+  s_2 = asdouble (t_2);
+  s_3 = asdouble (t_3);
+
+  vs_0 = (float64x2_t) { s_0, s_1 };
+  vs_1 = (float64x2_t) { s_2, s_3 };
+  c0 = (float64x2_t) { C[0], C[0] };
+  c1 = (float64x2_t) { C[1], C[1] };
+  c2 = (float64x2_t) { C[2], C[2] };
+  one = (float64x2_t) { 1.0, 1.0 };
+
+  z_0 = c0 * r_0 + c1;
+  z_1 = c0 * r_1 + c1;
+  r2_0 = r_0 * r_0;
+  r2_1 = r_1 * r_1;
+  y_0 = c2 * r_0 + one;
+  y_1 = c2 * r_1 + one;
+  y_0 = z_0 * r2_0 + y_0;
+  y_1 = z_1 * r2_1 + y_1;
+  y_0 = y_0 * vs_0;
+  y_1 = y_1 * vs_1;
+  result = pack_and_trunc (y_0, y_1);
+  return result;
+}
diff --git a/sysdeps/aarch64/fpu/libmvec_util.h b/sysdeps/aarch64/fpu/libmvec_util.h
index e69de29bb2..bd0463ce22 100644
--- a/sysdeps/aarch64/fpu/libmvec_util.h
+++ b/sysdeps/aarch64/fpu/libmvec_util.h
@@ -0,0 +1,54 @@
+/* Utility functions for Aarch64 vector functions.
+   Copyright (C) 2015-2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <stdint.h>
+#include <arm_neon.h>
+
+/* Copy lower 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline float64x2_t
+get_lo_and_extend (float32x4_t x)
+{
+  __Uint64x2_t tmp1 = (__Uint64x2_t) x;
+#if __BYTE_ORDER == __BIG_ENDIAN
+  uint64_t tmp2 = (uint64_t) tmp1[1];
+#else
+  uint64_t tmp2 = (uint64_t) tmp1[0];
+#endif
+  return vcvt_f64_f32 ((float32x2_t) tmp2);
+}
+
+/* Copy upper 2 elements of of a 4 element float vector into a 2 element
+   double vector.  */
+
+static __always_inline float64x2_t
+get_hi_and_extend (float32x4_t x)
+{
+  return vcvt_high_f64_f32 (x);
+}
+
+/* Copy a pair of 2 element double vectors into a 4 element float vector.  */
+
+static __always_inline float32x4_t
+pack_and_trunc (float64x2_t x, float64x2_t y)
+{
+  float32x2_t xx = vcvt_f32_f64 (x);
+  float32x2_t yy = vcvt_f32_f64 (y);
+  return (vcombine_f32 (xx, yy));
+}
diff --git a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
index e69de29bb2..9eb31c8dfc 100644
--- a/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-double-vlen2-wrappers.c
@@ -0,0 +1,24 @@
+/* Wrapper part of tests for aarch64 double vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+#include "test-double-vlen2.h"
+
+#define VEC_TYPE float64x2_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (exp), _ZGVnN2v_exp)
diff --git a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
index e69de29bb2..7f64acf886 100644
--- a/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
+++ b/sysdeps/aarch64/fpu/test-float-vlen4-wrappers.c
@@ -0,0 +1,24 @@
+/* Wrapper part of tests for float aarch64 vector math functions.
+   Copyright (C) 2019 Free Software Foundation, Inc.
+   This file is part of the GNU C Library.
+
+   The GNU C Library is free software; you can redistribute it and/or
+   modify it under the terms of the GNU Lesser General Public
+   License as published by the Free Software Foundation; either
+   version 2.1 of the License, or (at your option) any later version.
+
+   The GNU C Library is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+   Lesser General Public License for more details.
+
+   You should have received a copy of the GNU Lesser General Public
+   License along with the GNU C Library; if not, see
+   <http://www.gnu.org/licenses/>.  */
+
+#include <arm_neon.h>
+#include "test-float-vlen4.h"
+
+#define VEC_TYPE float32x4_t
+
+VECTOR_WRAPPER (WRAPPER_NAME (expf), _ZGVnN4v_expf)
diff --git a/sysdeps/aarch64/libm-test-ulps b/sysdeps/aarch64/libm-test-ulps
index 585e5bbce7..1ed4af9e55 100644
--- a/sysdeps/aarch64/libm-test-ulps
+++ b/sysdeps/aarch64/libm-test-ulps
@@ -1601,6 +1601,12 @@ float: 1
 idouble: 1
 ifloat: 1
 
+Function: "exp_vlen2":
+double: 1
+
+Function: "exp_vlen4":
+float: 1
+
 Function: "expm1":
 double: 1
 float: 1
diff --git a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
index e69de29bb2..9e178253f7 100644
--- a/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
+++ b/sysdeps/unix/sysv/linux/aarch64/libmvec.abilist
@@ -0,0 +1,2 @@
+GLIBC_2.30 _ZGVnN2v_exp F
+GLIBC_2.30 _ZGVnN4v_expf F

Follow-Ups:
- Re: [PATCH v2] Aarch64: Add simd exp/expf functions
  - From: Szabolcs Nagy
- Re: [PATCH v2] Aarch64: Add simd exp/expf functions
  - From: Joseph Myers

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]