This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Sparc exp(), expf() performance improvement

From: David Miller <davem at davemloft dot net>
To: patrick dot mcgehearty at oracle dot com
Cc: libc-alpha at sourceware dot org
Date: Mon, 31 Jul 2017 14:21:37 -0700 (PDT)
Subject: Re: [PATCH] Sparc exp(), expf() performance improvement
Authentication-results: sourceware.org; auth=none
References: <1501529969-96949-1-git-send-email-patrick.mcgehearty@oracle.com> <20170731.124719.1163288220939988504.davem@davemloft.net> <18ef0698-02a5-eb2d-fc87-ce234ab70ac6@oracle.com>

From: Patrick McGehearty <patrick.mcgehearty@oracle.com>
Date: Mon, 31 Jul 2017 16:06:44 -0500

> Sparc has a significant performance issue with RAW (read after write).
> That is, if a value is stored to a particular address and then read
> from that
> address before the store has reached L2 cache, a pipeline hiccup
> occurs
> and a 30+ cycle delay is seen. Most commonly this issue is seen in the
> case
> of register spill/fills, but it also occurs when a value in an integer
> register
> to stored to a temporary in memory and then loaded to a floating point
> register.
> The int to fp and fp to int operations are common in exp() algorithms
> due
> to cracking the exponent from the mantissa to determine which special
> case to use in handling particular input data ranges.
> 
> Starting with Niagara4 (T4), direct int to fp and fp to int transfer
> instructions
> were added, avoiding this performance issue. If we compile for any
> Sparc
> platform instead of T4 and later, we can't use the direct transfers.
> Note that T4 was first introduced in 2011, meaning most current
> Sparc/Linux platforms will have this support.
> 
> For comparison, recent x86 chips from Intel have thrown enough HW at
> the RAW issue to not have any delays when a read-after-write occurs.
> 
> The new algorithm is significantly different from the existing
> sysdeps/ieee754 algorithm.
> The new algorithm matches the one used by the Solaris/Studio libm
> exp(), expf() code.
> My effort was involved in porting (with Oracle corporate permission),
> not
> algorithm construction.
> 
> It seems likely that this code could be faster on other CPUs, but I've
> only tested it on Sparc
> as that's the machines I have ready access to. The advantage may be
> much less on other platforms.

You miss my point.

You are doing two _completely_ different things here.

First, you could simply build the existing exp() and expf() C code in
glibc with niagara4.  In fact, if this float<-->int move instruction
helps so much, you probably want to build the entire math library
this way with appropriate ifunc hooks.  Not just exp/expf.

Second, you could then introduce the new C code implementation of exp
and expf functions and:

1) See if it is faster on other sparc cpus.

2) Ask other glibc developers to test whether it is faster on
   non-sparc cpus as well.

Making both changes and only targetting post-niagara4 cpus is
completely the wrong way to go about this.

References:
- [PATCH] Sparc exp(), expf() performance improvement
  - From: Patrick McGehearty
- Re: [PATCH] Sparc exp(), expf() performance improvement
  - From: David Miller
- Re: [PATCH] Sparc exp(), expf() performance improvement
  - From: Patrick McGehearty

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]