nextafter() about an order of magnitude slower than trivial implementation

Thu Aug 19 11:20:49 GMT 2021

On 18/08/2021 14:11, Stefan Kanthak wrote:
> Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
> 
>> The 08/16/2021 18:03, Stefan Kanthak wrote:
>>> Testing and benchmarking an exponential function that consumes
>>> about 10ns/call on an AMD EPYC 7262, I noticed that nextafter()
>>> itself is DEAD SLOW: it consumes about 8.5ns/call!
>>>
>>> The following (quick&dirty) implementation consumes 0.7ns/call,
>>> i.e. is about an order of magnitude faster:
>>
>> correctly measuring latency on a modern x86_64 core:
>>
>> musl: 3.16 ns
>> glibc: 5.68 ns
>> your: 5.72 ns

Thanks for bring this up, if you want to contribute a patch please
follow the Contribution checklist [1].  We recently dropped the
requirement of the FSF contribution, so you can use a SCO-like 
license on the patches.

To change the current implementation I suggest you to also provide
a benchmark using glibc benchmark framework.  Some maths functions
provide both latency and reciprocal-throughput information, and
with both numbers we can evaluate if the new implementation is
indeed better on different machines.

I would just like to ask to keep the tone respectful and be open
to suggestion and criticize, so you do not repeat the same derail
thread on musl maillist [1]. Szabolcs and Wilco did an excellent 
job on some newer math functions implementation, you might read 
the old thread to see how was their approach.

[1] https://sourceware.org/glibc/wiki/Contribution%20checklist
[2] https://www.openwall.com/lists/musl/2021/08/15/18