[PATCHv4 2/2] powerpc64le: ifunc (almost) all *f128 routines in multiarch mode

Wed Jun 24 22:42:40 GMT 2020

On 6/24/20 3:41 PM, Adhemerval Zanella wrote:
> On 22/06/2020 20:04, Paul E Murphy wrote:
>> On 6/22/20 11:57 AM, Adhemerval Zanella via Libc-alpha wrote:

>>> What I would expect in realword cases is if the workload really
>>> uses float128 extensivelly to be built with -mcpu=power9 and/or
>>> -mfloat128/-mfloat128-hardware. It should cover most the required
>>> hotspots and glibc can focus on providing only cases where adding
>>> an specialized ifunc variant does make sense (as for the x86_64
>>> sysdeps/x86_64/fpu/multiarch/mp*) for instance.
>>>
>>> Also, if an optimized float128 glibc build is paramount, a much
>>> simpler solution would be to just provide a -mcpu=power9 built one.
>>
>> That kicks the can to the distros.  I think few ship such libraries. The whole value of multiarch is to expose these benefits without having to make the end user jump through such hurdles.  I don't think the x86 comparison holds.  Adding a couple of helpful instructions is tame compared to going from soft to hard fp.
> 
> My main issue with this approach is twofold: it basically tries to
> provide a soft and hard fp variant of of libm in the same library
> (adding build complexity, code bloat, and extra maintainability burden)
> and it relies heavily on the ifunc (which has it own issues that bites
> us now and then).

The design intentionally keeps all of the complexity in one place
hidden, without changes to common code.  Doing each individually is
fool's errand for even a small set of functions.  ifunc is the de facto
standard for multiarch.  Renaming a few redirects is a trivial amount
of work solved via grep.  Likewises, adding 200kb to one library is 
better than shipping a second 1+MB library.

> 
> The x86 comparison is sounded because we could make something similar
> and start to provide libm variants for AVX, AVX256, etc in the same
> manner.  Instead the approach used was to profile and provide specific
> ifunc variants to hotpots.

Again, those are incremental changes to an existing scalar isa.

> That's why I suggested to provide hardware float128 optimized variant
> when realword usercases provide us feedback that this might a gain.
> Besides the limited float128 current usage, I also expect in most
> scenarios that symbols that compiler implement as builtin (such sqrt)
> won't be called at all. Even for more complex math functions, most likely
> only a subset will be extensively used, that these are the ones that
> I think we should focus on instead of just push for the bigger hammer
> and optimize everything (which would be just simpler by providing
> a specific libm anyways).
> 

I disagree.  There is an obvious massive performance gap for all
transcendental functions.  It's our responsibility to proactively
solve this transparently for our users who are far more restricted
in which glibc they get to use.  Doubly so as ppc64le starts
transitioning to the new long double abi.

How about slightly changing the makefile to an opt-in model whereby
only transcendental abi (and other single instruction like sqrt) are
run through the auto-float128-ifunc machinery?