This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.

From: Joseph Myers <joseph at codesourcery dot com>
To: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
Cc: <libc-alpha at sourceware dot org>
Date: Wed, 18 Oct 2017 17:22:03 +0000
Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
Authentication-results: sourceware.org; auth=none
References: <1508172962-97543-1-git-send-email-patrick.mcgehearty@oracle.com>

On Mon, 16 Oct 2017, Patrick McGehearty wrote:

> diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
> index 6757a14..555a47f 100644
> --- a/sysdeps/ieee754/dbl-64/e_exp.c
> +++ b/sysdeps/ieee754/dbl-64/e_exp.c
> @@ -1,238 +1,452 @@
> +/* EXP function - Compute double precision exponential
> +   Copyright (C) 2017 Free Software Foundation, Inc.
> +   This file is part of the GNU C Library.
> +
> +   The GNU C Library is free software; you can redistribute it and/or
> +   modify it under the terms of the GNU Lesser General Public
> +   License as published by the Free Software Foundation; either
> +   version 2.1 of the License, or (at your option) any later version.
> +
> +   The GNU C Library is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +   Lesser General Public License for more details.
> +
> +   You should have received a copy of the GNU Lesser General Public
> +   License along with the GNU C Library; if not, see
> +   <http://www.gnu.org/licenses/>.  */
> +
>  /*
> - * IBM Accurate Mathematical Library
> - * written by International Business Machines Corp.
> - * Copyright (C) 2001-2017 Free Software Foundation, Inc.

The file still contains __exp1, used by pow.  Thus, I do not think 
removing the existing copyright dates is appropriate unless as a 
preliminary patch you move __exp1 to another file (e_pow.c being the 
obvious one; watch out for the sysdeps/x86_64/fpu/multiarch/e_exp-*.c with 
their own macro definitions of __exp1 that would no longer be appropriate 
after such a move).

Does this patch remove all calls to __slowexp?  If so, I'd expect 
slowexp.c to be removed as part of the patch (including 
architecture-specific / multiarch variants, and Makefile references to 
special options etc. for building slowexp.c).

> +extern double __ieee754_exp (double);
> +
> +
> +static const double TBL[] = {
> +  1.00000000000000000000e+00, 0.00000000000000000000e+00,
> +  1.02189714865411662714e+00, 5.10922502897344389359e-17,
> +  1.04427378242741375480e+00, 8.55188970553796365958e-17,

As per previous comments, this sort of table of precomputed values should 
use hex float constants.  You can use before-and-after comparison of 
object files to make sure the hex floats do represent the same values as 
the decimal constants.

> +static const double
> +  half		=0.5,
> +/* Following three values used to scale x to primary range */
> +  invln2_32	=4.61662413084468283841e+01,	/* 0x40471547, 0x652b82fe */
> +  ln2_32hi	=2.16608493865351192653e-02,	/* 0x3f962e42, 0xfee00000 */
> +  ln2_32lo	=5.96317165397058656257e-12,	/* 0x3d9a39ef, 0x35793c76 */
> +/* t2-t5 terms used for polynomial computation */
> +  t2		=1.6666666666526086527e-1,	/* 3fc5555555548f7c */
> +  t3		=4.1666666666226079285e-2,	/* 3fa5555555545d4e */
> +  t4		=8.3333679843421958056e-3,	/* 3f811115b7aa905e */
> +  t5		=1.3888949086377719040e-3,	/* 3f56c1728d739765 */
> +  one		=1.0,
> +/* maximum value for x to not overflow */
> +  threshold1	=7.09782712893383973096e+02,	/* 0x40862E42, 0xFEFA39EF */
> +/* maximum value for -x to not underflow */
> +  threshold2	=7.45133219101941108420e+02,	/* 0x40874910, 0xD52D3051 */
> +/* scaling factor used when result near zero*/
> +  twom54	=5.55111512312578270212e-17,	/* 0x3c900000, 0x00000000 */
> +/* value used to force inexact condition */
> +  small		=1.0e-100;

Likewise hex floats for these values (other than half / one / small).  
Also, the formatting is far from GNU standard; I'd expect e.g.:

/* Scaling factor used when result near zero.  */
static const double twom54 = 0x1p-54;

(single space either side of =, comment before each variable definition 
starting with a capital letter and ending with ".  ").

> +	  /* 
> +	     Use FE_TONEAREST rounding mode for computing yy.y 
> +	     Avoid set/reset of rounding mode if already in FE_TONEAREST mode
> +	  */
> +	  fe_val = __fegetround();
> +	  if (fe_val == FE_TONEAREST) {
> +	    t = xx.x * xx.x;
> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
> +	    retval = one + yy.y;
> +	  } else {
> +	    __fesetround(FE_TONEAREST);
> +	    t = xx.x * xx.x;
> +	    yy.y = xx.x + (t * (half + xx.x * t2) +
> +			   (t * t) * (t3 + xx.x * t4 + t * t5));
> +	    retval = one + yy.y;
> +	    __fesetround(fe_val);

The formatting here is off, missing space before '(' in function calls.  
And you should be using the optimized SET_RESTORE_ROUND (FE_TONEAREST), 
which in addition to avoiding setting the rounding mode to a value it 
already has, also e.g. arranges on x86_64 to set only the SSE rounding 
mode and not the x87 mode because only the SSE mode needs setting for 
types other than long double.

(I'm not clear why you actually need to set the rounding mode here.  Does 
excessive inaccuracy result in this particular case if you don't set it?)

> +      fe_val = __fegetround();

Likewise again, and subsequently.

> +      if (fe_val == FE_TONEAREST) {
> +	z = xx.x - TBL2[j];
> +	t = z * z;
> +	/* the "small" term below guarantees inexact will be raised */

Guaranteeing "inexact" is not part of the goals for most libm functions, 
so I expect you can remove that term.

> +      if (-xx.x > threshold2)
> +	{			/* set underflow error condition */
> +	  double force_underflow = tiny * tiny;
> +	  math_force_eval (force_underflow);
> +	  retval = zero;
> +	  return retval;

I'd expect the force_underflow value to be returned in this case (so the 
return value is appropriate in round-upward mode).

-- 
Joseph S. Myers
joseph@codesourcery.com

Follow-Ups:
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Joseph Myers
- Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Patrick McGehearty

References:
- [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
  - From: Patrick McGehearty

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]