This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf

From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
To: "Sekhar, Ashwin" <Ashwin dot Sekhar at cavium dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd at arm dot com
Date: Tue, 13 Jun 2017 14:23:51 +0100
Subject: Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
Authentication-results: sourceware.org; auth=none
Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
Nodisclaimer: True
References: <20170613071707.43396-1-ashwin.sekhar@caviumnetworks.com> <593FC77A.6050609@arm.com> <1497358590.4998.34.camel@caviumnetworks.com>
Spamdiagnosticmetadata: NSPM
Spamdiagnosticoutput: 1:99

On 13/06/17 13:56, Sekhar, Ashwin wrote:
>>>   SINF
>>>   ---------------------------------------------------------
>>>   Input           ThunderX88      ThunderX99      CortexA57
>>>   ---------------------------------------------------------
>>>   0.0              1.88x           1.18x           1.17x
>>>   2.0^-28          1.33x           1.12x           1.03x
>>>   2.0^-6           1.48x           1.28x           1.27x
>>>   0.6*Pi/4         0.94x           1.14x           1.21x
>>>   13*Pi/8          1.41x           2.00x           2.16x
>>>   17*Pi/8          1.45x           1.93x           2.23x
>> based on these numbers my current c implementation is faster,
>> but it will take time to polish that for submission.
> 
> Are these going to be aarch64 specific C implementations or changes in
> generic code?
> 
> And Could you please inform when you are going to submit your patches.
> 
> I also dont agree to having duplicated efforts. But if you dont plan to
> submit your changes in the near future, I guess I will go ahead
> addressing the other comments and work on submitting a v2 patch.
> 

the plan is the next release cycle (i plan to post powf
first, then work on sinf/cosf, possibly sin/cos too, then
look at vector versions once the vector abi is in gcc).

the c implementation is generic
(sometimes the instruction scheduling is suboptimal and
i found that union based bithacks don't always give good
code but those are issues we can work on the gcc side)

one issue is fma vs non-fma code, i haven't solved that
yet, but it will probably work either way (since we use
double prec), if it makes a difference i will add ifdef
code path for the two cases (might affect the fast arg
reduction)

> Thanks
> Ashwin
> 
>>
>>>
>>>   1000*Pi/4       19.68x          37.46x          27.99x
>>>   2.0^51          12.00x          13.58x          13.49x
>> this is a bug in the current generic code that it falls back
>> to slow argument reduction even though single precision arg
>> reduction can be done in a few cycles over the entire range,
>>
>> i think the x86_64 sse code could still be simpler and faster
>> (not that it matters much as these are rare cases).
>>
>>>
>>>   Inf              1.04x           1.05x           1.12x
>>>   Nan              0.95x           0.87x           0.82x
>>>   ---------------------------------------------------------

Follow-Ups:
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Adhemerval Zanella
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Joseph Myers

References:
- [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Ashwin Sekhar T K
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Szabolcs Nagy
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Sekhar, Ashwin

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]