This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, libc-alpha at sourceware dot org, Ashwin Sekhar T K <ashwin dot sekhar at caviumnetworks dot com>
- Cc: nd at arm dot com
- Date: Tue, 13 Jun 2017 14:28:16 +0100
- Subject: Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf
- Authentication-results: sourceware.org; auth=none
- Authentication-results: spf=none (sender IP is ) smtp.mailfrom=Szabolcs dot Nagy at arm dot com;
- Nodisclaimer: True
- References: <20170613071707.43396-1-ashwin.sekhar@caviumnetworks.com> <593FC77A.6050609@arm.com> <1de74f07-dac3-3e01-11fc-48e3787e0f7e@gotplt.org>
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:99
On 13/06/17 12:54, Siddhesh Poyarekar wrote:
> On Tuesday 13 June 2017 04:37 PM, Szabolcs Nagy wrote:
>> i thought it was a vector version because of ASIMD, but it's
>> just scalar sinf/cosf.
>>
>> there are many issues with this patch, but most importantly it
>> duplicates work as i also happen to work on single precision
>> math functions (sorry).
>
> I don't know if this reason even makes sense for rejecting a patch -
> you're basically saying that we should reject code that is already
> posted because you have been working on something that is going to come
> out in the future.
>
i didnt say i rejected his code, but that duplicated
effort is not good.
> Ashwin has come out with his code first, so please stick to only the
> technical points for review.
>
>> issues:
>>
>> - asm code wont be accepted: generic c code can be just as fast.
>
> To be specific, ASM code won't be accepted until it is proven to be
> faster than existing C code.
>
asm is not acceptable even if it's slightly faster.
(fix the compiler in that case)
asm code maintenance is a huge problem in glibc,
in the long term generic code is better in a lot
of domains, the sinf/cosf code is such a case,
there is no special instruction that helps them
that the compiler cannot easily generate.
>> - ifunc wont be accepted: all instructions are available on all cpus.
>
> Agreed.
>
>> - math code should not be fsf assigned lgpl code, but universally
>> available, post it under non-restricted license first, then assign
>> it to fsf so it can be used everywhere without legal issues.
>
> This is not a glibc requirement. I don't know if we can even make that
> a requirement for arm/aarch64 code under the scope of the glibc project
> (i.e., it seems like a technical limitation - how do we reject arm
> patches in libc-alpha and redirect devs to cortex-strings or whatever
> else?), but that is something that Joseph or Carlos may be able to answer.
>
> Perhaps a prominent note in the wiki should be a start.
>
i didn't say it's a glibc requirement, you have to use
common sense here: there are algorithms that are so
useful outside of glibc and so generic that it is just
unnecessary complication to develop them within glibc
(obviously it's not a complication for glibc, but for
everybody else, and i cant impose this procedure on
others, but i still think this is the better for the
larger community).
>> - document the worst case ulp error and number of misrounded
>> cases: for single argument scalar functions you can easily test
>> all possible inputs in all rounding modes and that information
>> helps to decide if the algorithm is good enough.
>
> Agreed.
>
>> - benchmark measurements ideally provide a latency and a
>> throughput numbers as well for the various ranges or use a
>> realistic workload, in this case there are many branches
>> for the various input ranges so it is useful to have a
>> benchmark that can show the effect of that.
>
> As I mentioned earlier, realistic workloads are more or less a myth
> currently for math, so unless someone comes up with some, synthetic is
> all you'll get.
if one tests the same input in a loop that does
not measure the effect of branches and thus we end
up breaking up the input space into many special
ranges, however in practice that's not optimal.