This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.


On 11/4/19 7:22 PM, Adhemerval Zanella wrote:


On 04/11/2019 12:27, Stefan Liebler wrote:
If compiled with z196 zarch support, the load-fp-integer instruction
is used to implement nearbyint, nearbyintf, nearbyintl.
Otherwise the common-code implementation is used.

+
+double
+__nearbyint (double x)
+{
+  double y;
+  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
+     x to the nearest integer according to current rounding mode (M3-field: 0)
+     where inexact exceptions are suppressed (M4-field: 4).  */
+  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
+  return y;
+}
+libm_alias_double (__nearbyint, nearbyint)

At least with recent gcc __builtin_nearbyint generates the expected fidbra
instruction for -march=z196.  I wonder if we could start to simplify some
math symbols implementation where new architectures/extensions provide
direct implementation by a direct mapping implemented by compiler builtins.

I would expect to:

   1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
      since I hardly doubt these micro-optimizations really pay off with
      recent architectures and compiler version.

   2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:

      * sysdeps/ieee754/dbl-64/s_nearbyint.c
[...]
      double
      __nearbyint (double x)
      {
      #if __USE_NEARBYINT_BUILTIN
        return __builtin_nearbyint (x);
      #else
        /* Use generic implementation.  */
      #endif
      }

   3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.

It would allow to simplify some architectures, aarch64 for instance.


Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.

E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop. I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used. I have never build glibc with other compilers like clang. Is there a special need to check this behavior?

In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series. This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.

Afterwards the same can be done for the remaining functions.

I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
The generic version contains all USE_XYZ_BUILTIN macros defined to 0
and each architecture can provide its own file with other settings.
For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
How about this?

Thanks,
Stefan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]