[2/2 PATCH v2] PPC64: Add libmvec SIMD double-precision power function.

Wed Jul 3 21:20:00 GMT 2019

Shawn Landden <shawn@git.icu> writes:

> Based off the ./sysdeps/ieee754/dbl-64/pow.c implementation,
> and provides identical results.
>
> Unlike other libmvec functions, this sets the underflow and overflow bits.
> The caller can check these flags, and possibly re-run the calculations with
> scalar pow to figure out what is causing the overflow or underflow.
>
> I may have not normalized the data for benchmarking this properly,
> but operating only on integers betwee 0-2^32 and floats between 0.5 and 1 I get the following:
>
> Running 20 times over 32MiB
> vector: mean 535.824919 (sd 0.246088)
> scalar: mean 286.384220 (sd 0.027630)
>
> Which is a very impressive speed boost.

This patch looks good to me with some fixes that I'm listing here.
I'm going to merge it as soon as we fix the issues with the single precision
implementation.  So, I don't think we need a v3 here.

> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> index 25d29b9a4a..43fa8505b8 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/Makefile
> @@ -52,6 +52,7 @@ libmvec-sysdep_routines += vec_d_cos2_vsx vec_s_cosf4_vsx \
>                             vec_s_powf4_vsx s_powf_log2_data \
>  			   vec_math_errf vec_math_err \
>  			   vec_d_exp2_vsx vec_d_exp_data \
> +                           vec_d_pow2_vsx s_pow_log2_data \

Replace 8 spaces with tabs.
Fixed.

>  			   vec_d_sincos2_vsx vec_s_sincosf4_vsx
>  CFLAGS-vec_d_cos2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
>  CFLAGS-vec_d_log2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
> @@ -69,6 +70,7 @@ CFLAGS-vec_s_exp2f_data.c += -mabi=altivec -maltivec -mvsx
>  CFLAGS-vec_d_exp2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector
>  CFLAGS-vec_d_exp_data.c += -mabi=altivec -maltivec -mvsx
>  CFLAGS-vec_s_powf4_vsx.c += -mabi=altivec -maltivec -mvsx
> +CFLAGS-vec_s_pow2_vsx.c += -mabi=altivec -maltivec -mvsx -mpower8-vector

s/vec_s/vec_d/
Fixed.

s_pow_log2_data.c includes altivec.h indirectly, which also requires to use:

    CFLAGS-s_pow_log2_data.c += -mabi=altivec -maltivec -mvsx

Fixed.

> diff --git a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
> index 7162d06e0c..f3f351093b 100644
> --- a/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
> +++ b/sysdeps/powerpc/powerpc64/fpu/multiarch/vec_math_err.c
> @@ -1,5 +1,5 @@
> -/* Double-precision math error handling.
> -   Copyright (C) 2019 Free Software Foundation, Inc.
> +/* Single-precision math error handling.
> +   Copyright (C) 2017-2019 Free Software Foundation, Inc.

I think this change got into the patch by mistake.  Please, let me know if I'm
mistaken.
Removed.

> @@ -38,3 +36,15 @@ __math_oflow (uint32_t sign)
>  {
>    return xflow (sign, 0x1p769);
>  }
> +
> +
> +double __math_invalid(double x)
> +{
> +  return (x - x) / (x - x);
> +}
> +
> +double __math_divzero(uint32_t sign)
> +{
> +  return (double)(sign ? -1.0 : 1.0) / 0.0;
> +}
> +

Missing attribute_hidden.
Fixed.

-- 
Tulio Magno