Twiddling with 64-bit values as 2 ints;

Mon Aug 23 13:18:50 GMT 2021

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 21/08/2021 10:34, Stefan Kanthak wrote:
>> 
>> (Heretic.-) questions:
>> - why does glibc still employ such ugly code?
>> - Why doesn't glibc take advantage of 64-bit integers in such code?
> 
> Because no one cared to adjust the implementation.  Recently Wilco
> has removed a lot of old code that still uses 32-bit instead of 64-bit
> bo bit twinddling in floating-pointer implementation (check caa884dda7
> and 9e97f239eae1f2).

That's good to hear.

> I think we should move to use a simplest code assuming 64-bit CPU

D'accord.
And there's a second direction where you might move: almost all CPUs
have separate general purpose registers and floating-point registers.
Bit-twiddling generally needs extra (and sometimes slow) transfers
between them.
In 32-bit environment, where arguments are typically passed on the
stack, at least loading an argument from the stack into a GPR or FPR
makes no difference.
In 64-bit environment, where arguments are passed in registers, they
should be operated on in these registers.

So: why not implement routines like nextafter() without bit-twiddling,
using floating-point as far as possible for architectures where this
gives better results?

The simple implementation I showed in my initial post improved the
throughput in my benchmark (on AMD64) by an order of magnitude.
In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.

Does GLIBC offer a macro like "PREFER_FP_IMPLEMENTATION" that can be
used to select between the integer bit-twiddling code and FP-preferring
code during compilation?

> and let the compiler optimize it (which unfortunately gcc is not that
> smart in all the cases).

I know, and I just learned that GCC does NOT perform quite some
optimisations I expect from a mature compiler.
Quoting Jakub Jelinek on gcc@gcc.gnu.org:

| GCC doesn't do value range propagation of floating point values, not
| even the special ones like NaNs, infinities, +/- zeros etc., and without
| that the earlier ifs aren't taken into account for the earlier code.

The code I used to demonstrate this deficiency is TOMS 722...

Stefan