This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] aarch64: Implement math acceleration via builtins

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: Szabolcs Nagy <szabolcs dot nagy at arm dot com>, libc-alpha at sourceware dot org, Michael Collison <Michael dot Collison at arm dot com>
Cc: nd at arm dot com
Date: Tue, 17 Oct 2017 14:48:30 -0200
Subject: Re: [PATCH] aarch64: Implement math acceleration via builtins
Authentication-results: sourceware.org; auth=none
References: <HE1PR0802MB2377FB89896B3EEAD493799595480@HE1PR0802MB2377.eurprd08.prod.outlook.com> <9e8aaf0f-4346-628e-aa4f-8fad8323e8a3@linaro.org> <59E62C6F.3010702@arm.com>


On 17/10/2017 14:14, Szabolcs Nagy wrote:
> On 13/10/17 23:36, Adhemerval Zanella wrote:
>> On 13/10/2017 18:26, Michael Collison wrote:
>>> This patch converts asm statements into builtin and neon intrinsics for AArch64.
>>> As an example for the file sysdeps/aarch64/fpu/_s_ceil.c, we convert the function from
>>>
>>> double
>>> __ceil (double x)
>>> {
>>>   double result;
>>>   asm ("frintp\t%d0, %d1" :
>>>        "=w" (result) : "w" (x) );
>>>   return result;
>>> }
>>>
>>> into
>>>
>>> double
>>> __ceil (double x)
>>> {
>>>   return __builtin_ceil (x);
>>> }
>>>
> ...
>> Looks good in general with some comments below.  Is there any advantage
>> of the refactor besides avoid inline assembly?
> 
> i think this is a code clean up patch (the generated code
> should not change other than temp register numbers), it
> avoids the current preprocessor hackery and inline asm.
> 
>> Since you refactoring it, I would also suggest to use builtin for the 
>> aarch64 match_private.h sqrt and get/set fpcr at fpu_control.h.
>>
> 
> note that's no longer a code clean up patch:
> 
> e.g. gcc does not know the cost of inline asm so it cannot
> schedule it optimally, but it has detailed model of the
> builtins so using them in headers should be an improvement.

Yes, I am aware of it.  My question was more out of curiosity if 
the code cleanup brings any non obvious performance improvement 
over libm.

> 
> Wilco has some sqrt patches, so sqrt can be left alone
> (if those patches don't progress then it can be fixed later)
> https://sourceware.org/ml/libc-alpha/2017-09/msg00958.html
> https://sourceware.org/ml/libc-alpha/2017-09/msg00959.html
> https://sourceware.org/ml/libc-alpha/2017-09/msg00960.html
> 
> i think using builtins for fpcr/fpsr is a good idea.

Ack.

> 
> __builtin_aarch64_get_fpcr
> __builtin_aarch64_set_fpcr
> __builtin_aarch64_get_fpsr
> __builtin_aarch64_set_fpsr
> 
> please check if earliest supported gcc (4.9) have them,
> i think fixing this can be a separate patch.

Alright, I will prepare a patch then.

> 
>>> +long long int
>>> +__llrint (double x)
>>> +{
>>> +  double r = __builtin_rint (x);
>>> +
>>> +  // Prevent gcc from calling lrint directly when compiled with -fno-math-errno
>>> +  // by inserting a barrier
>>> +
>>> +  math_opt_barrier (r);
> ...
>>
>> Also I think the comment line is too long and I am not sure if it is
>> usual to use '//' comments (usually we use default '/*' '*/' pairs).
>>
> 
> please fix the comment style.
> 
>>> +long int
>>> +__lrint (double x)
>>>  {
>>> -  OTYPE result;
>>> -  ITYPE temp;
>>>  
>>>  #if IREG_SIZE == 64 && OREG_SIZE == 32
>>> +  long int result;
>>> +
>>>    if (__builtin_fabs (x) > INT32_MAX)
>>>      {
>>>        /* Converting large values to a 32 bit int may cause the frintx/fcvtza
>>> @@ -96,10 +74,14 @@ __CONCATX(__,FUNC) (ITYPE x)
>>>        return result;
>>>      }
>>>  #endif
>>> -  asm ( "frintx" "\t%" IREGS "1, %" IREGS "2\n\t"
>>> -        "fcvtzs" "\t%" OREGS "0, %" IREGS "1"
>>> -        : "=r" (result), "=w" (temp) : "w" (x) );
>>> -  return result;
>>> +
>>> +  double r =  __builtin_rint (x);
>>> +
>>> +  // Prevent gcc from calling lrintf directly when compiled with -fno-math-errno
>>> +  // by inserting a barrier
>>> +
>>> +  math_opt_barrier (r);
>>> +  return r;
>>>  }
>>
>> Same comment as before about math_opt_barrier and the comments seems off (it is
>> referring to lrintf instead of lrint).
>>
> 
> please fix the comment.
>

References:
- [PATCH] aarch64: Implement math acceleration via builtins
  - From: Michael Collison
- Re: [PATCH] aarch64: Implement math acceleration via builtins
  - From: Adhemerval Zanella
- Re: [PATCH] aarch64: Implement math acceleration via builtins
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]