Twiddling with 64-bit values as 2 ints;

Mon Aug 23 17:32:11 GMT 2021

Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

> On 23/08/2021 12:37, Stefan Kanthak wrote:
>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:
>> 
>>> On 23/08/2021 10:18, Stefan Kanthak wrote:
>>>> Adhemerval Zanella <adhemerval.zanella@linaro.org> wrote:

>>>> The simple implementation I showed in my initial post improved the
>>>> throughput in my benchmark (on AMD64) by an order of magnitude.
>>>> In Szabolcs Nagy benchmark measuring latency it took 0.04ns/call
>>>> longer (5.72ns vs. 5.68ns) -- despite the POOR job GCC does on FP.
>>>
>>> Your implementation triggered a lot of regression,
>> 
>> The initial, FP-preferring code was a demonstration, not a patch.
> 
> Right, but it does do not much sense comparing performance numbers with
> an implementation that adds a lot of regressions. 

This argument also holds for a correct FP-preferring implementation due to
the POOR code GCC currently generates: the 4 superfluous FP-comparisions
plus conditional branches GCC generates have worse runtime than the missing
code to handle fenv/underflow/overflow/errno.

[...]

>> Having dedicated implementations for different architectures is even more
>> costly!
>> My intention/proposal is to have at most two different generic implementations,
>> one using integer bit-twiddling wherever possible, thus supporting soft-fp well,
>> the second using floating-point wherever possible, thus supporting modern
>> hardware well.
> 
> The only reservation I have for such approach it it would add some more maintenance
> and testing.

Insert "wherever needed" before/after "wherever possible".

Stefan