This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RE: [PATCH] aarch64: Implement math acceleration via builtins


Patch updated to reflect recommended changes.

Okay for trunk?

2017-10-19  Michael Collison  <michael.collison@arm.com>

	* sysdeps/aarch64/fpu/e_sqrt.c (ieee754_sqrt): Replace
	asm statements with __builtin_sqrt.
	* sysdeps/aarch64/fpu/e_sqrtf.c (ieee754_sqrtf): Replace
	asm statements with __builtin_sqrtf.
	* sysdeps/aarch64/fpu/s_ceil.c (__ceil): Replace
	asm statements with __builtin_ceil.
	* sysdeps/aarch64/fpu/s_ceilf.c (__ceilf): Replace
	asm statements with __builtin_ceilf.
	* sysdeps/aarch64/fpu/s_floor.c (__floor): Replace
	asm statements with __builtin_floor.
	* sysdeps/aarch64/fpu/s_floorf.c (__floorf): Replace
	asm statements with __builtin_floorf.
	* sysdeps/aarch64/fpu/s_fma.c (__fma): Replace
	asm statements with __builtin_fma.
	* sysdeps/aarch64/fpu/s_fmaf.c (__fmaf): Replace
	asm statements with __builtin_fmaf.
	* sysdeps/aarch64/fpu/s_fmax.c (__fmax): Replace
	asm statements with __builtin_fmax.
	* sysdeps/aarch64/fpu/s_fmaxf.c (__fmaxf): Replace
	asm statements with __builtin_fmaxf.
	* sysdeps/aarch64/fpu/s_fmin.c (__fmin): Replace
	asm statements with __builtin_fmin.
	* sysdeps/aarch64/fpu/s_fminf.c (__fminf): Replace
	asm statements with __builtin_fminf.
	* sysdeps/aarch64/fpu/s_frint.c: Delete file.
	* sysdeps/aarch64/fpu/s_frintf.c: Delete file.
	* sysdeps/aarch64/fpu/s_llrint.c (__llrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_llrintf.c (__llrintf): Ditto.
	* sysdeps/aarch64/fpu/s_llround.c (__llround): Replace
	asm statements with neon intrinsic.
	* sysdeps/aarch64/fpu/s_llroundf.c (__llroundf): Ditto.
	* sysdeps/aarch64/fpu/s_lrint.c (__lrint): Replace
	asm statements with builtin and neon intrinsic.
	* sysdeps/aarch64/fpu/s_lrintf.c (__lrintf): Ditto.
	* sysdeps/aarch64/fpu/s_lround.c (__lround): Replace
	asm statements with neon intrinsic vcvtad_s64_f64.
	* sysdeps/aarch64/fpu/s_lroundf.c (__lroundf): Replace
	asm statements with neon intrinsic vcvtas_s32_f32.
	* sysdeps/aarch64/fpu/s_nearbyint.c (__nearbyint): Replace
	asm statements with __builtin_nearbyint.
	* sysdeps/aarch64/fpu/s_nearbyintf.c (__nearbyintf): Replace
	asm statements with __builtin_nearbyintf.
	* sysdeps/aarch64/fpu/s_rint.c (__rint): Replace
	asm statements with __builtin_rint.
	* sysdeps/aarch64/fpu/s_rintf.c (__rintf): Replace
	asm statements with __builtin_rintf.
	* sysdeps/aarch64/fpu/s_round.c (__round): Replace
	asm statements with __builtin_round.
	* sysdeps/aarch64/fpu/s_roundf.c (__roundf): Replace
	asm statements with __builtin_roundf.
	* sysdeps/aarch64/fpu/s_trunc.c (__trunc): Replace
	asm statements with __builtin_trunc.
	* sysdeps/aarch64/fpu/s_truncf.c (__truncf): Replace
	asm statements with __builtin_truncf.
	* sysdeps/aarch64/fpu/Makefile: Build e_sqrt[f].c with
	-fno-math-errno.

-----Original Message-----
From: Szabolcs Nagy 
Sent: Tuesday, October 17, 2017 9:15 AM
To: Adhemerval Zanella <adhemerval.zanella@linaro.org>; libc-alpha@sourceware.org; Michael Collison <Michael.Collison@arm.com>
Cc: nd <nd@arm.com>
Subject: Re: [PATCH] aarch64: Implement math acceleration via builtins

On 13/10/17 23:36, Adhemerval Zanella wrote:
> On 13/10/2017 18:26, Michael Collison wrote:
>> This patch converts asm statements into builtin and neon intrinsics for AArch64.
>> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert 
>> the function from
>>
>> double
>> __ceil (double x)
>> {
>>   double result;
>>   asm ("frintp\t%d0, %d1" :
>>        "=w" (result) : "w" (x) );
>>   return result;
>> }
>>
>> into
>>
>> double
>> __ceil (double x)
>> {
>>   return __builtin_ceil (x);
>> }
>>
...
> Looks good in general with some comments below.  Is there any 
> advantage of the refactor besides avoid inline assembly?

i think this is a code clean up patch (the generated code should not change other than temp register numbers), it avoids the current preprocessor hackery and inline asm.

> Since you refactoring it, I would also suggest to use builtin for the
> aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.
> 

note that's no longer a code clean up patch:

e.g. gcc does not know the cost of inline asm so it cannot
schedule it optimally, but it has detailed model of the
builtins so using them in headers should be an improvement.

Wilco has some sqrt patches, so sqrt can be left alone
(if those patches don't progress then it can be fixed later)
https://sourceware.org/ml/libc-alpha/2017-09/msg00958.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00959.html
https://sourceware.org/ml/libc-alpha/2017-09/msg00960.html

i think using builtins for fpcr/fpsr is a good idea.

__builtin_aarch64_get_fpcr
__builtin_aarch64_set_fpcr
__builtin_aarch64_get_fpsr
__builtin_aarch64_set_fpsr

please check if earliest supported gcc (4.9) have them,
i think fixing this can be a separate patch.

>> +long long int
>> +__llrint (double x)
>> +{
>> +  double r = __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
...
> 
> Also I think the comment line is too long and I am not sure if it is
> usual to use '//' comments (usually we use default '/*' '*/' pairs).
> 

please fix the comment style.

>> +long int
>> +__lrint (double x)
>>  {
>> -  OTYPE result;
>> -  ITYPE temp;
>>  
>>  #if IREG_SIZE == 64 && OREG_SIZE == 32
>> +  long int result;
>> +
>>    if (__builtin_fabs (x) > INT32_MAX)
>>      {
>>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
>> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>>        return result;
>>      }
>>  #endif
>> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>> -        : "=r" (result), "=w" (temp) : "w" (x) );
>> -  return result;
>> +
>> +  double r =  __builtin_rint (x);
>> +
>> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
>> +  // by inserting a barrier
>> +
>> +  math_opt_barrier (r);
>> +  return r;
>>  }
> 
> Same comment as before about math_opt_barrier and the comments seems off (it is
> referring to lrintf instead of lrint).
> 

please fix the comment.

Attachment: gnutools-1925-rebase-v13.patch
Description: gnutools-1925-rebase-v13.patch


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]