This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

From: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
To: Joseph Myers <joseph at codesourcery dot com>
Cc: libc-alpha at sourceware dot org
Date: Thu, 30 Nov 2017 18:47:52 -0600
Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Authentication-results: sourceware.org; auth=none
References: <1510028685-65660-1-git-send-email-patrick.mcgehearty@oracle.com> <alpine.DEB.2.20.1711232107300.28121@digraph.polyomino.org.uk>

Thank you for the continued detailed reviews.
Due to your comments about TBL[2*j] and TBL[2*j+1],
I computed exp(x) over 10 million algorithmically generated
values for x using both the TBL values used by the Solaris/Studio
version of exp() and the TBL values you suggested.
There was no case where exp(x) differed.
I computed the values for TBL using quad precision and
got the same values you recommend. That got me thinking
some more and I realized changing from 32 table entries
to 64 table entries was really not that difficult.
The values for TBL are generated as you recommend.

My next patch submission (coming shortly) will use
j/64 with 64 TBL entries for TBL[2*j] and TBL[2*j+1].
That approach gives the same performance with fewer
ulp errors. On that same 10 million value test,
I'm seeing roughly 16 differences per 10,000 values
instead of 29 differences per 10,000 values with
the 32 TBL entry version. In addition, we only see
one difference in test-double-exp.out instead of three.
The difference is still a single ulp.

I've tested the new version on Sparc and x86.

- patrick



On 11/23/2017 3:19 PM, Joseph Myers wrote:

On Mon, 6 Nov 2017, Patrick McGehearty wrote:

@@ -561,8 +561,10 @@ math-CPPFLAGS += -D__NO_MATH_INLINES -D__LIBC_INTERNAL_MATH_INLINES
  ifneq ($(long-double-fcts),yes)
  # The `double' and `long double' types are the same on this machine.
  # We won't compile the `long double' code at all.  Tell the `double' code
-# to define aliases for the `FUNCl' names.
-math-CPPFLAGS += -DNO_LONG_DOUBLE
+# to define aliases for the `FUNCl' names.  To avoid type conflicts in
+# defining those aliases, tell <math.h> to declare the `FUNCl' names with
+# `double' instead of `long double'.
+math-CPPFLAGS += -DNO_LONG_DOUBLE -D_Mlong_double_=double
  endif

# These files quiet sNaNs in a way that is optimized away without

This diff hunk is bogus (reverting a recent change I made) and should not
be included in this patch.

+	      if (hx < 0x3e300000)
+		{
+		  retval = one + xx.x;
+		  return (retval);

No parentheses around return value.

+		}
+	      retval = one + xx.x * (one + half * xx.x);
+	      return (retval);

Likewise.

+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));

Split lines before an operator, not after.

+	      yy.y = xx.x + (t * (half + xx.x * t2) +
+			     (t * t) * (t3 + xx.x * t4 + t * t5));

Likewise.

+	  yy.y = z + (t * (half + (z * t2)) +
+		      (t * t) * (t3 + z * t4 + t * t5));

Likewise.

+	  yy.y = z + (t * (half + (z * t2)) +
+		      (t * t) * (t3 + z * t4 + t * t5));

Likewise.

+      return (retval);

Avoid parentheses around return value.

+	  if (ix == 0xfff00000 && xx.i_part[LOW_HALF] == 0)
+	    return (zero);	/* exp(-inf) = 0.  */

Likewise.

+	  return (xx.x * xx.x);	/* exp(nan/inf) is nan or inf.  */

Likewise.

+      yy.y = z + (t * (half + z * t2) +
+		  (t * t) * (t3 + z * t4 + t * t5));

Split line before operator.

+      yy.y = z + (t * (half + z * t2) +
+		  (t * t) * (t3 + z * t4 + t * t5));

Likewise.

+  return (yy.y);

Remove parentheses.

/* EXP function tables - for use in ocmputing double precisoin exponential

s/ocmputing/computing/

s/precisoin/precision/

+/* TBL[2*j] and TBL[2*j+1] are double precision numbers used to
+   approximate exp(x) using the formula given in the comments
+   for e_exp.c.  */

I believe the correct semantics to describe are: TBL[2*j] is 2**(j/32),
rounded to nearest; TBL[2*j+1] is 2**(j/32) - TBL[2*j], rounded to
nearest.  Now if that's the case, three of the low parts should be
adjusted by 1ulp because the current values aren't actually rounded to
nearest (unless you have some concrete reason why the present values, that
aren't rounded to nearest, are optimal):

+    0x1.0b5586cf9890fp+0,  0x1.8a62e4adc610ap-54,

0x1.8a62e4adc610ap-54 should be 0x1.8a62e4adc610bp-54.

+    0x1.5342b569d4f82p+0, -0x1.07abe1db13cacp-55,

-0x1.07abe1db13cacp-55 should be -0x1.07abe1db13cadp-55.

+    0x1.d5818dcfba487p+0,  0x1.2ed02d75b3706p-55,

0x1.2ed02d75b3706p-55 should be 0x1.2ed02d75b3707p-55.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]