This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH 01/17] S390: Use load-fp-integer instruction for nearbyint functions.
On 05/11/2019 12:49, Stefan Liebler wrote:
> On 11/4/19 7:22 PM, Adhemerval Zanella wrote:
>>
>>
>> On 04/11/2019 12:27, Stefan Liebler wrote:
>>> If compiled with z196 zarch support, the load-fp-integer instruction
>>> is used to implement nearbyint, nearbyintf, nearbyintl.
>>> Otherwise the common-code implementation is used.
>>
>>> +
>>> +double
>>> +__nearbyint (double x)
>>> +{
>>> + double y;
>>> + /* The z196 zarch "load fp integer" (fidbra) instruction is rounding
>>> + x to the nearest integer according to current rounding mode (M3-field: 0)
>>> + where inexact exceptions are suppressed (M4-field: 4). */
>>> + __asm__ ("fidbra %0,0,%1,4" : "=f" (y) : "f" (x));
>>> + return y;
>>> +}
>>> +libm_alias_double (__nearbyint, nearbyint)
>>
>> At least with recent gcc __builtin_nearbyint generates the expected fidbra
>> instruction for -march=z196. I wonder if we could start to simplify some
>> math symbols implementation where new architectures/extensions provide
>> direct implementation by a direct mapping implemented by compiler builtins.
>>
>> I would expect to:
>>
>> 1. Move all sysdeps/ieee754/dbl-64/wordsize-64 to sysdeps/ieee754/dbl-64/
>> since I hardly doubt these micro-optimizations really pay off with
>> recent architectures and compiler version.
>>
>> 2. Add internal macros __USE_<SYMBOL>_BUILTIN and use as:
>>
>> * sysdeps/ieee754/dbl-64/s_nearbyint.c
>> [...]
>> double
>> __nearbyint (double x)
>> {
>> #if __USE_NEARBYINT_BUILTIN
>> return __builtin_nearbyint (x);
>> #else
>> /* Use generic implementation. */
>> #endif
>> }
>>
>> 3. Define the __USE_<SYMBOL>_BUILTIN for each architecture.
>>
>> It would allow to simplify some architectures, aarch64 for instance.
>>
>
> Currently the long double builtins are generating an extra not needed stack frame compared to the inline assembly. But this needs to be fixed in gcc.
>
> E.g. if build for s390 (31bit), where the fidbra & co instructions are not available, the builtins generate a call to libc which would end in an infinite loop. I will make some tests on s390 starting with the current minimum gcc 6.2 to be sure that the instructions are used. I have never build glibc with other compilers like clang. Is there a special need to check this behavior?
I think google maintains some branches with clang support (google/grte/*),
but there is no know effort to sync these with master. So I see there is
no need to focus on non-gcc compiler for now.
>
> In general I can start with those functions where the builtins can be used on s390, but I won't move all wordsize-64 functions and adjust them to use the builtins with this patch series.
> This means for now, I start with using builtins for nearbyint, rint, floor, ceil, trunc, round and copysign.
>
> Afterwards the same can be done for the remaining functions.
>
> I will create an own header file, e.g. sysdeps/generic/math-use-builtins.h in the same way as fix-fp-int-compare-invalid.h.
> The generic version contains all USE_XYZ_BUILTIN macros defined to 0
> and each architecture can provide its own file with other settings.
> For each functions XYZ there will be three macros, e.g. USE_NEARBYINT_BUILTIN, USE_NEARBYINTF_BUILTIN, USE_NEARBYINTL_BUILTIN.
> How about this?
>
I think it is fair start, with the adjustments pointed out by Joseph.
I will check out the worksize-64 refactor to avoid duplicate the
implementations.