This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Sparc exp(), expf() performance improvement
- From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
- To: David Miller <davem at davemloft dot net>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 1 Aug 2017 11:06:39 -0500
- Subject: Re: [PATCH] Sparc exp(), expf() performance improvement
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com>
On 7/31/2017 4:21 PM, David Miller wrote:
From: Patrick McGehearty <firstname.lastname@example.org>
Date: Mon, 31 Jul 2017 16:06:44 -0500
Sparc has a significant performance issue with RAW (read after write).
That is, if a value is stored to a particular address and then read
address before the store has reached L2 cache, a pipeline hiccup
and a 30+ cycle delay is seen. Most commonly this issue is seen in the
of register spill/fills, but it also occurs when a value in an integer
to stored to a temporary in memory and then loaded to a floating point
The int to fp and fp to int operations are common in exp() algorithms
to cracking the exponent from the mantissa to determine which special
case to use in handling particular input data ranges.
Starting with Niagara4 (T4), direct int to fp and fp to int transfer
were added, avoiding this performance issue. If we compile for any
platform instead of T4 and later, we can't use the direct transfers.
Note that T4 was first introduced in 2011, meaning most current
Sparc/Linux platforms will have this support.
For comparison, recent x86 chips from Intel have thrown enough HW at
the RAW issue to not have any delays when a read-after-write occurs.
The new algorithm is significantly different from the existing
The new algorithm matches the one used by the Solaris/Studio libm
exp(), expf() code.
My effort was involved in porting (with Oracle corporate permission),
It seems likely that this code could be faster on other CPUs, but I've
only tested it on Sparc
as that's the machines I have ready access to. The advantage may be
much less on other platforms.
You miss my point.
You are doing two _completely_ different things here.
First, you could simply build the existing exp() and expf() C code in
glibc with niagara4. In fact, if this float<-->int move instruction
helps so much, you probably want to build the entire math library
this way with appropriate ifunc hooks. Not just exp/expf.
Second, you could then introduce the new C code implementation of exp
and expf functions and:
1) See if it is faster on other sparc cpus.
2) Ask other glibc developers to test whether it is faster on
non-sparc cpus as well.
Making both changes and only targetting post-niagara4 cpus is
completely the wrong way to go about this.
I'm preparing to do a trial run on -mcpu=niagara4 for glibc.
I'll report back on any interesting differences for make bench
with/without -mcpu=niagara4 for the current sourceware tree.
I will note from my point of view, this project is focused only
on exp() and expf() as Sparc/Solaris/Studio showed dramatically
better performance on those specific functions. There are a few
other functions which run faster on Sparc/Solaris/Studio, but
nothing like the performance difference for exp() and expf().