This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Sharing vector math routines?


On 12/04/2019 22:35, Steve Ellcey wrote:
> I have a question/thought about the libmvec routines that I am working
> on for Aarch64 and that Bert Tenjy has been working on for PPC64.
> 
> Given that both of us are writing routines in C (vs. Assembly) I was
> wondering if we should try to share the code/algorithms being used.
> The vector types being used have different names (__Float32x4_t or
> float32x4_t or 'vector float') but the names could be put in a macro
> and then we could use a shared source file for the implementation.

there is no portable syntax for simd in c.

gcc vector extension gets close but there are
operations that are not easy to express:

- the naive "check if any input is out of bound"

 uint32x4_t p = x > threshold;
 for (i=0; i<lanes; i++)
   if (p[i])
     specialcase(x);

does not give the best code across targets, this
is relevant for libmvec since often this is the
fastest way to deal with special cases.

- important math operations have no portable simd
variant (fabs, sqrt, fma, round, conversions,..)
you have to use intrinsics for them or make the
compiler understand

  for (i=0; i<lanes; i++)
    y[i] = op(x[i]);

(assuming the scalar op has the right semantics
so it has a corresponding simd instruction)

- i'd expect variation across targets about
what is the best algorithm (e.g. use fabs vs
x & mask, do x > threshold with fp vs int cmp).

> So we could have a vector sinf like this for Aarch64:
> 
> #include <arm_neon.h>
> #define VECSIZE 4
> #define VECTYPE float32x4_t
> #define BASETYPE float
> #include "vec_sinf.c"
> 
> And for PPC it might be:
> 
> #include <altivec.h>
> #define VECSIZE 4
> #define VECTYPE vector float
> #define BASETYPE float
> #include "vec_sinf.c"
> 
> Then the shared vec_sinf.c could be written using VECSIZE, VECTYPE, and
> BASETYPE and shared between these and other platforms.

i think you will find that a lot more parameters
are needed if we try to do it this way.
(we will have to define generic vector length
agnostic intrinsics and types that each target
implements)

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]