[PATCHv4 2/2] powerpc64le: ifunc (almost) all *f128 routines in multiarch mode
Paul E Murphy
Tue Jun 23 16:19:05 GMT 2020
On 6/22/20 6:04 PM, Paul E Murphy via Libc-alpha wrote:
> On 6/22/20 11:57 AM, Adhemerval Zanella via Libc-alpha wrote:
>> On 15/06/2020 17:59, Paul E. Murphy via Libc-alpha wrote:
>>> See the Makefile changes for high level design/commentary.
>>> V4 changes -
>>> * Drop patch to add libm_alias_exclusive_ldouble. After
>>> recent refactoring of fmaf128, it showed some unfixable
>>> flaws. Instead, use macro renaming for nextafterf128 to
>>> generate the needed symbols, and rework.
>>> V3 changes -
>>> * Cleanup comments.
>>> * Rebase against fmaf128 cleanup
>>> * Use Makeconfig trick to set var in le/power9 sysdep dir to
>>> determine if ifunc support is necessary. This works with
>>> the upcoming CPU detection patch.
>>> * fmaf128 patch is no longer needed.
>>> V2 changes -
>>> * move duplicate redirect macros into
>>> * replace subshell usage with command sequencing
>>> * Add more instructive documentation in Makefile about how all
>>> these ugly pieces work togethor
>>> * Minor comment cleanup throughout
>>> * Improve inline documentation/commentary throughout
>>> Programatically generate simple wrappers for most libm *f128
>>> objects and a set of ifunc objects to unify them.
>>> A second set of implementation files are generated which simply
>>> include the first implementation encountered along the search
>>> path. This usually works, excepting when a wrapper is overriden
>>> and makefile search order slightly diverges from include order.
>>> A set of additional headers are included which primarily rely
>>> on asm redirects to rename, and less frequently macro renames
>>> where an asm redirect is not possible. These intercept several
>>> common headers to install redirect and disable macros at specific
>>> times. This works surprisingly well. Notably, some ugliness
>>> occurs when header inclusion must be coerced at certain times
>>> before turning off aliasing and plt bypass wrappers.
>>> Notably, the only special case is s_significandf128.c. It is
>>> doubly special as exists to support ldouble redirects, and
>>> exposes subtle difference between makefile rules and search path
>>> orders. Commentary is inlined.
>>> Admittedly, this makes shared maintenance a tiny bit more
>>> difficult, but lays groundwork for supporting more optimized
>>> float128 routines which very overtly assume a soft-fp runtime.
>>> Changes to internal float128 API should fail at compile time,
>>> thus build-many-glibcs.py should readily catch any divergence.
>>> Finally, don't build this support if requested CPU is newer
>>> than power8.
> This is refactoring noise, and while not wrong is not meant to be
> in the final commit message.
>> I am trying to digest the requirements to add such complexity on the
>> powerpc64le build rules, specially the internally Makefile hackery
> This is addressed in the notes. Mildly speaking, soft-fp code
> generation on P8 is quite limited. This is pretty easy to identify in
> any non-trivial binary128 function. e.g expf128 is almost 1/3 the
> size on P9. Likewise many complex functions are almost 1/2 the size.
> Anything soft-fp touches massively increases code size and impedes
> instruction scheduling.
> I can get some more concrete numbers, but my hope is this enables us
> to make even more meaningful improvements to common code when hardware
> support is available.
I did a quick test for expf128. It's around a 2.5x speedup on the fast
path (Building a table of 1M small values). This massive speedup
is due to the expensive PLT calls required for every FP operation, and
the soft-fp variants cannot use FMA. That hurts. Quite a bit of libm
centers around series approximation like expf128.
More information about the Libc-alpha