This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.



On 05/11/2019 12:49, Stefan Liebler wrote:
> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> +  double y;
>>> +  /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> +     x to the nearest integer according to current rounding mode (M3-field: 0)
>>> +     where inexact exceptions are suppressed (M4-field: 4).  */
>>> +  __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> +  return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196.  I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>>    1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>>       since I hardly doubt these micro-optimizations really pay off with
>>       recent architectures and compiler version.
>>
>>    2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>>       * sysdeps/ieee754/dbl-64/s_nearbyint.c
>>             [...]
>>       double
>>       __nearbyint (double x)
>>       {
>>       #if __USE_NEARBYINT_BUILTIN
>>         return __builtin_nearbyint (x);
>>       #else
>>         /* Use generic implementation.  */
>>       #endif
>>       }
>>
>>    3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
> 
> Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.
> 
> E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop.  I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used.  I have never build glibc with other compilers like clang.  Is there a special need to check this behavior?

I think google maintains some branches with clang support (google/grte/*),
but there is no know effort to sync these with master.  So I see there is
no need to focus on non-gcc compiler for now.

> 
> In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series.
> This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.
> 
> Afterwards the same can be done for the remaining functions.
> 
> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
> How about this?
> 

I think it is fair start, with the adjustments pointed out by Joseph.
I will check out the worksize-64 refactor to avoid duplicate the
implementations.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]