This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Sharing vector math routines?

On 12/04/2019 22:35, Steve Ellcey wrote:
> I have a question/thought about the libmvec routines that I am working
> on for Aarch64 and that Bert Tenjy has been working on for PPC64.
> Given that both of us are writing routines in C (vs. Assembly) I was
> wondering if we should try to share the code/algorithms being used.
> The vector types being used have different names (__Float32x4_t or
> float32x4_t or 'vector float') but the names could be put in a macro
> and then we could use a shared source file for the implementation.

there is no portable syntax for simd in c.

gcc vector extension gets close but there are
operations that are not easy to express:

- the naive "check if any input is out of bound"

 uint32x4_t p = x > threshold;
 for (i=0; i<lanes; i++)
   if (p[i])

does not give the best code across targets, this
is relevant for libmvec since often this is the
fastest way to deal with special cases.

- important math operations have no portable simd
variant (fabs, sqrt, fma, round, conversions,..)
you have to use intrinsics for them or make the
compiler understand

  for (i=0; i<lanes; i++)
    y[i] = op(x[i]);

(assuming the scalar op has the right semantics
so it has a corresponding simd instruction)

- i'd expect variation across targets about
what is the best algorithm (e.g. use fabs vs
x & mask, do x > threshold with fp vs int cmp).

> So we could have a vector sinf like this for Aarch64:
> #include <arm_neon.h>
> #define VECSIZE 4
> #define VECTYPE float32x4_t
> #define BASETYPE float
> #include "vec_sinf.c"
> And for PPC it might be:
> #include <altivec.h>
> #define VECSIZE 4
> #define VECTYPE vector float
> #define BASETYPE float
> #include "vec_sinf.c"
> Then the shared vec_sinf.c could be written using VECSIZE, VECTYPE, and
> BASETYPE and shared between these and other platforms.

i think you will find that a lot more parameters
are needed if we try to do it this way.
(we will have to define generic vector length
agnostic intrinsics and types that each target

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]