[PATCH v2] x86_64: Optimize ffsll function code size.

Tue Aug 1 13:46:58 GMT 2023

On 31/07/23 20:44, Sunil Pandey wrote:
>
> 
> It is not going to fix the problem. Random 20% variation will continue even with
> builtin patch in benchmarking.
> 
> I do not see anything wrong using architecture features, if it helps
> people save their valuable debugging time. After all, its valuable
> glibc feature to override generic implementation with architecture specific
> code.
>
Because it is a synthetic benchmark that is exercising code that should not be
stressed with the default compiler flags.  And the problem of using arch-specific
code is when you already have support from the compiler to generate optimized
code, this is just extra complexity plus maintenance.

On my system, ffssl is only used by a single program (/usr/bin/nvidia-persistenced)
which I am not sure how it is compile because it a closed source one.  The ffs is
used by a couple more (gdb, lvm, and some mesa drivers), but again far from being
a hotspot.

And as Andreas have said, the best course of action here is to fix the compiler
if it is not generating code enough code.  Fixing gcc means that we will get 
any optimization by free (by using the builtin) and any code that actually uses
ffs/ffsl/ffsll.