This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf

From: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
To: libc-alpha at sourceware dot org
Date: Tue, 13 Jun 2017 11:15:28 -0300
Subject: Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
Authentication-results: sourceware.org; auth=none
References: <20170613071707.43396-1-ashwin.sekhar@caviumnetworks.com> <593FC77A.6050609@arm.com> <1de74f07-dac3-3e01-11fc-48e3787e0f7e@gotplt.org> <593FE870.8000801@arm.com>


On 13/06/2017 10:28, Szabolcs Nagy wrote:
> On 13/06/17 12:54, Siddhesh Poyarekar wrote:
>> On Tuesday 13 June 2017 04:37 PM, Szabolcs Nagy wrote:
>>> i thought it was a vector version because of ASIMD, but it's
>>> just scalar sinf/cosf.
>>>
>>> there are many issues with this patch, but most importantly it
>>> duplicates work as i also happen to work on single precision
>>> math functions (sorry).
>>
>> I don't know if this reason even makes sense for rejecting a patch -
>> you're basically saying that we should reject code that is already
>> posted because you have been working on something that is going to come
>> out in the future.
>>
> 
> i didnt say i rejected his code, but that duplicated
> effort is not good.
> 
>> Ashwin has come out with his code first, so please stick to only the
>> technical points for review.
>>
>>> issues:
>>>
>>> - asm code wont be accepted: generic c code can be just as fast.
>>
>> To be specific, ASM code won't be accepted until it is proven to be
>> faster than existing C code.
>>
> 
> asm is not acceptable even if it's slightly faster.
> (fix the compiler in that case)
> 
> asm code maintenance is a huge problem in glibc,
> in the long term generic code is better in a lot
> of domains, the sinf/cosf code is such a case,
> there is no special instruction that helps them
> that the compiler cannot easily generate.
> 

I tend to agree with you and generic code can be useful not only for an
specific CPU.  However in this special case I think the coordination must
first came from you, since you are the one that is asking Ashwin to
hold/abandon the patch for a future submission.  Maybe sharing your current
work, even if it is still WIP, with him can sped up development and give
hints for future developments.


>>> - ifunc wont be accepted: all instructions are available on all cpus.
>>
>> Agreed.
>>
>>> - math code should not be fsf assigned lgpl code, but universally
>>> available, post it under non-restricted license first, then assign
>>> it to fsf so it can be used everywhere without legal issues.
>>
>> This is not a glibc requirement.  I don't know if we can even make that
>> a requirement for arm/aarch64 code under the scope of the glibc project
>> (i.e., it seems like a technical limitation - how do we reject arm
>> patches in libc-alpha and redirect devs to cortex-strings or whatever
>> else?), but that is something that Joseph or Carlos may be able to answer.
>>
>> Perhaps a prominent note in the wiki should be a start.
>>
> 
> i didn't say it's a glibc requirement, you have to use
> common sense here: there are algorithms that are so
> useful outside of glibc and so generic that it is just
> unnecessary complication to develop them within glibc
> (obviously it's not a complication for glibc, but for
> everybody else, and i cant impose this procedure on
> others, but i still think this is the better for the
> larger community).

Would be possible to multi-licensing the code under lgpl and a
less restrictive one?

> 
>>> - document the worst case ulp error and number of misrounded
>>> cases: for single argument scalar functions you can easily test
>>> all possible inputs in all rounding modes and that information
>>> helps to decide if the algorithm is good enough.
>>
>> Agreed.
>>
>>> - benchmark measurements ideally provide a latency and a
>>> throughput numbers as well for the various ranges or use a
>>> realistic workload, in this case there are many branches
>>> for the various input ranges so it is useful to have a
>>> benchmark that can show the effect of that.
>>
>> As I mentioned earlier, realistic workloads are more or less a myth
>> currently for math, so unless someone comes up with some, synthetic is
>> all you'll get.
> 
> if one tests the same input in a loop that does
> not measure the effect of branches and thus we end
> up breaking up the input space into many special
> ranges, however in practice that's not optimal.
>

References:
- [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Ashwin Sekhar T K
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Szabolcs Nagy
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Siddhesh Poyarekar
- Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
  - From: Szabolcs Nagy

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]