Twiddling with 64-bit values as 2 ints;
Stefan Kanthak
stefan.kanthak@nexgo.de
Mon Aug 23 13:18:50 GMT 2021
Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
> On 21/08/2021 10:34, Stefan Kanthak wrote:
>>
>> (Heretic.-) questions:
>> - why does glibc still employ such ugly code?
>> - Why doesn't glibc take advantage of 64-bit integers in such code?
>
> Because no one cared to adjust the implementation. Recently Wilco
> has removed a lot of old code that still uses 32-bit instead of 64-bit
> bo bit twinddling in floating-pointer implementation (check caa884dda7
> and 9e97f239eae1f2).
That's good to hear.
> I think we should move to use a simplest code assuming 64-bit CPU
D'accord.
And there's a second direction where you might move: almost all CPUs
have separate general purpose registers and floating-point registers.
Bit-twiddling generally needs extra (and sometimes slow) transfers
between them.
In 32-bit environment, where arguments are typically passed on the
stack, at least loading an argument from the stack into a GPR or FPR
makes no difference.
In 64-bit environment, where arguments are passed in registers, they
should be operated on in these registers.
So: why not implement routines like nextafter() without bit-twiddling,
using floating-point as far as possible for architectures where this
gives better results?
The simple implementation I showed in my initial post improved the
throughput in my benchmark (on AMD64) by an order of magnitude.
In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
used to select between the integer bit-twiddling code and FP-preferring
code during compilation?
> and let the compiler optimize it (which unfortunately gcc is not that
> smart in all the cases).
I know, and I just learned that GCC does NOT perform quite some
optimisations I expect from a mature compiler.
Quoting Jakub Jelinek on gcc@gcc.gnu.org:
| GCC doesn't do value range propagation of floating point values, not
| even the special ones like NaNs, infinities, +/- zeros etc., and without
| that the earlier ifs aren't taken into account for the earlier code.
The code I used to demonstrate this deficiency is TOMS 722...
Stefan
More information about the Libc-help
mailing list