This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC][PATCH 0/2] aarch64: Add optimized ASIMD versions of sinf/cosf

On Tuesday 13 June 2017 04:37 PM, Szabolcs Nagy wrote:
> i thought it was a vector version because of ASIMD, but it's
> just scalar sinf/cosf.
> there are many issues with this patch, but most importantly it
> duplicates work as i also happen to work on single precision
> math functions (sorry).

I don't know if this reason even makes sense for rejecting a patch -
you're basically saying that we should reject code that is already
posted because you have been working on something that is going to come
out in the future.

Ashwin has come out with his code first, so please stick to only the
technical points for review.

> issues:
> - asm code wont be accepted: generic c code can be just as fast.

To be specific, ASM code won't be accepted until it is proven to be
faster than existing C code.

> - ifunc wont be accepted: all instructions are available on all cpus.


> - math code should not be fsf assigned lgpl code, but universally
> available, post it under non-restricted license first, then assign
> it to fsf so it can be used everywhere without legal issues.

This is not a glibc requirement.  I don't know if we can even make that
a requirement for arm/aarch64 code under the scope of the glibc project
(i.e., it seems like a technical limitation - how do we reject arm
patches in libc-alpha and redirect devs to cortex-strings or whatever
else?), but that is something that Joseph or Carlos may be able to answer.

Perhaps a prominent note in the wiki should be a start.

> - document the worst case ulp error and number of misrounded
> cases: for single argument scalar functions you can easily test
> all possible inputs in all rounding modes and that information
> helps to decide if the algorithm is good enough.


> - benchmark measurements ideally provide a latency and a
> throughput numbers as well for the various ranges or use a
> realistic workload, in this case there are many branches
> for the various input ranges so it is useful to have a
> benchmark that can show the effect of that.

As I mentioned earlier, realistic workloads are more or less a myth
currently for math, so unless someone comes up with some, synthetic is
all you'll get.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]