[PATCH/2nd version] Re: nextafter() about an order of magnitude slower than trivial implementation
Stefan Kanthak
stefan.kanthak@t-online.de
Fri Aug 20 09:52:50 GMT 2021
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 18/08/2021 14:11, Stefan Kanthak wrote:
>> Szabolcs Nagy <szabolcs.nagy@arm.com> wrote:
>>
>>> The 08/16/2021 18:03, Stefan Kanthak wrote:
>>>> Testing and benchmarking an exponential function that consumes
>>>> about 10ns/call on an AMD EPYC 7262, I noticed that nextafter()
>>>> itself is DEAD SLOW: it consumes about 8.5ns/call!
>>>>
>>>> The following (quick&dirty) implementation consumes 0.7ns/call,
>>>> i.e. is about an order of magnitude faster:
>>>
>>> correctly measuring latency on a modern x86_64 core:
>>>
>>> musl: 3.16 ns
>>> glibc: 5.68 ns
>>> your: 5.72 ns
>
> Thanks for bring this up, if you want to contribute a patch
See attachment.
For completeness sake and to ease an initial review, I post the full
source here too: without ALTERNATE defined, it uses integer arithmetic
to the extent possible, like the original.
Choose whatever you like best or gives better throughput/latency.
Changes from the 1st version of the patch:
- leftover FP comparision substituted with integer operations;
- increment/decrement logic simplified/unified, avoiding a
conditional branch.
I expect the compiler to detect the common clause
"&& ((xx.s < 0) == (yy.s < 0)))" introduced with these changes.
Stefan
JFTR: .../sysdeps/ieee754/flt-32/s_nextafterf.c as well as
.../sysdeps/i386/fpu/s_nextafterl.c will need the same changes.
--- .../math/s_nextafter.c ---
/* IEEE functions
* nextafter(x,y)
* return the next machine floating-point number of x in the
* direction toward y.
* Special cases:
*/
/* Ugly hack so that the aliasing works. */
#define __nexttoward __internal___nexttoward
#define nexttoward __internal_nexttoward
#include <errno.h>
#include <math.h>
#include <math-barriers.h>
#include <math_private.h>
#include <float.h>
#include <libm-alias-double.h>
double __nextafter(double x, double y)
{
union {
double d;
int64_t s;
uint64_t u;
} xx = {x}, yy = {y};
uint64_t ax = xx.u + xx.u; /* |x|<<1 */
uint64_t ay = yy.u + yy.u; /* |y|<<1 */
if(ax > 0x7ffull << 53) /* x is nan */
return x;
if(ay > 0x7ffull << 53) /* y is nan */
return y;
#ifdef ALTERNATE
if(x == y) return y; /* x=y, return y */
#else
if((ax == ay)
&& ((xx.s < 0) == (yy.s < 0)))
return y;
#endif
if(ax == 0) { /* x == 0 */
xx.u = yy.s < 0 ? 0x8000000000000001ull : 1ull;
xx.d = math_opt_barrier (xx.d);
xx.d += 0.0;
math_force_eval (xx.d); /* raise underflow flag */
return xx.d; /* return +-minsubnormal */
}
#ifdef ALTERNATE
if((x < y) == (x < 0.0))
--xx.u;
else
++xx.u;
#else
if((ax < ay)
&& ((xx.s < 0) == (yy.s < 0)))
++xx.u;
else
--xx.u;
#endif
ax = xx.u + xx.u; /* |x|<<1 */
if((ax >= 0x7ffull << 53) /* overflow */
|| (ax < 0x001ull << 53)) { /* underflow */
xx.d = math_opt_barrier (xx.d);
xx.d += 0.0;
math_force_eval (xx.d); /* raise overflow/underflow flag */
__set_errno (ERANGE);
}
return xx.d;
}
libm_alias_double (__nextafter, nextafter)
#ifdef NO_LONG_DOUBLE
strong_alias (__nextafter, __nexttowardl)
weak_alias (__nexttowardl, nexttowardl)
#undef __nexttoward
strong_alias (__nextafter, __nexttoward)
#undef nexttoward
weak_alias (__nextafter, nexttoward)
#endif
--- EOF ---
-------------- next part --------------
A non-text attachment was scrubbed...
Name: glibc.patch
Type: application/octet-stream
Size: 3325 bytes
Desc: not available
URL: <https://sourceware.org/pipermail/libc-alpha/attachments/20210820/94c2ecc6/attachment-0001.obj>
More information about the Libc-alpha
mailing list