This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Sparc exp(), expf() performance improvement
From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 16:06:44 -0500
> Sparc has a significant performance issue with RAW (read after write).
> That is, if a value is stored to a particular address and then read
> from that
> address before the store has reached L2 cache, a pipeline hiccup
> occurs
> and a 30+ cycle delay is seen. Most commonly this issue is seen in the
> case
> of register spill/fills, but it also occurs when a value in an integer
> register
> to stored to a temporary in memory and then loaded to a floating point
> register.
> The int to fp and fp to int operations are common in exp() algorithms
> due
> to cracking the exponent from the mantissa to determine which special
> case to use in handling particular input data ranges.
>
> Starting with Niagara4 (T4), direct int to fp and fp to int transfer
> instructions
> were added, avoiding this performance issue. If we compile for any
> Sparc
> platform instead of T4 and later, we can't use the direct transfers.
> Note that T4 was first introduced in 2011, meaning most current
> Sparc/Linux platforms will have this support.
>
> For comparison, recent x86 chips from Intel have thrown enough HW at
> the RAW issue to not have any delays when a read-after-write occurs.
>
> The new algorithm is significantly different from the existing
> sysdeps/ieee754 algorithm.
> The new algorithm matches the one used by the Solaris/Studio libm
> exp(), expf() code.
> My effort was involved in porting (with Oracle corporate permission),
> not
> algorithm construction.
>
> It seems likely that this code could be faster on other CPUs, but I've
> only tested it on Sparc
> as that's the machines I have ready access to. The advantage may be
> much less on other platforms.
You miss my point.
You are doing two _completely_ different things here.
First, you could simply build the existing exp() and expf() C code in
glibc with niagara4. In fact, if this float<-->int move instruction
helps so much, you probably want to build the entire math library
this way with appropriate ifunc hooks. Not just exp/expf.
Second, you could then introduce the new C code implementation of exp
and expf functions and:
1) See if it is faster on other sparc cpus.
2) Ask other glibc developers to test whether it is faster on
non-sparc cpus as well.
Making both changes and only targetting post-niagara4 cpus is
completely the wrong way to go about this.