This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
- From: Joseph Myers <joseph at codesourcery dot com>
- To: Patrick McGehearty <patrick dot mcgehearty at oracle dot com>
- Cc: <libc-alpha at sourceware dot org>
- Date: Wed, 18 Oct 2017 17:22:03 +0000
- Subject: Re: [PATCH] Improves __ieee754_exp() performance by greater than 5x on sparc/x86.
- Authentication-results: sourceware.org; auth=none
- References: <1508172962-97543-1-git-send-email-patrick.mcgehearty@oracle.com>
On Mon, 16 Oct 2017, Patrick McGehearty wrote:
> diff --git a/sysdeps/ieee754/dbl-64/e_exp.c b/sysdeps/ieee754/dbl-64/e_exp.c
> index 6757a14..555a47f 100644
> --- a/sysdeps/ieee754/dbl-64/e_exp.c
> +++ b/sysdeps/ieee754/dbl-64/e_exp.c
> @@ -1,238 +1,452 @@
> +/* EXP function - Compute double precision exponential
> + Copyright (C) 2017 Free Software Foundation, Inc.
> + This file is part of the GNU C Library.
> +
> + The GNU C Library is free software; you can redistribute it and/or
> + modify it under the terms of the GNU Lesser General Public
> + License as published by the Free Software Foundation; either
> + version 2.1 of the License, or (at your option) any later version.
> +
> + The GNU C Library is distributed in the hope that it will be useful,
> + but WITHOUT ANY WARRANTY; without even the implied warranty of
> + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
> + Lesser General Public License for more details.
> +
> + You should have received a copy of the GNU Lesser General Public
> + License along with the GNU C Library; if not, see
> + <http://www.gnu.org/licenses/>. */
> +
> /*
> - * IBM Accurate Mathematical Library
> - * written by International Business Machines Corp.
> - * Copyright (C) 2001-2017 Free Software Foundation, Inc.
The file still contains __exp1, used by pow. Thus, I do not think
removing the existing copyright dates is appropriate unless as a
preliminary patch you move __exp1 to another file (e_pow.c being the
obvious one; watch out for the sysdeps/x86_64/fpu/multiarch/e_exp-*.c with
their own macro definitions of __exp1 that would no longer be appropriate
after such a move).
Does this patch remove all calls to __slowexp? If so, I'd expect
slowexp.c to be removed as part of the patch (including
architecture-specific / multiarch variants, and Makefile references to
special options etc. for building slowexp.c).
> +extern double __ieee754_exp (double);
> +
> +
> +static const double TBL[] = {
> + 1.00000000000000000000e+00, 0.00000000000000000000e+00,
> + 1.02189714865411662714e+00, 5.10922502897344389359e-17,
> + 1.04427378242741375480e+00, 8.55188970553796365958e-17,
As per previous comments, this sort of table of precomputed values should
use hex float constants. You can use before-and-after comparison of
object files to make sure the hex floats do represent the same values as
the decimal constants.
> +static const double
> + half =0.5,
> +/* Following three values used to scale x to primary range */
> + invln2_32 =4.61662413084468283841e+01, /* 0x40471547, 0x652b82fe */
> + ln2_32hi =2.16608493865351192653e-02, /* 0x3f962e42, 0xfee00000 */
> + ln2_32lo =5.96317165397058656257e-12, /* 0x3d9a39ef, 0x35793c76 */
> +/* t2-t5 terms used for polynomial computation */
> + t2 =1.6666666666526086527e-1, /* 3fc5555555548f7c */
> + t3 =4.1666666666226079285e-2, /* 3fa5555555545d4e */
> + t4 =8.3333679843421958056e-3, /* 3f811115b7aa905e */
> + t5 =1.3888949086377719040e-3, /* 3f56c1728d739765 */
> + one =1.0,
> +/* maximum value for x to not overflow */
> + threshold1 =7.09782712893383973096e+02, /* 0x40862E42, 0xFEFA39EF */
> +/* maximum value for -x to not underflow */
> + threshold2 =7.45133219101941108420e+02, /* 0x40874910, 0xD52D3051 */
> +/* scaling factor used when result near zero*/
> + twom54 =5.55111512312578270212e-17, /* 0x3c900000, 0x00000000 */
> +/* value used to force inexact condition */
> + small =1.0e-100;
Likewise hex floats for these values (other than half / one / small).
Also, the formatting is far from GNU standard; I'd expect e.g.:
/* Scaling factor used when result near zero. */
static const double twom54 = 0x1p-54;
(single space either side of =, comment before each variable definition
starting with a capital letter and ending with ". ").
> + /*
> + Use FE_TONEAREST rounding mode for computing yy.y
> + Avoid set/reset of rounding mode if already in FE_TONEAREST mode
> + */
> + fe_val = __fegetround();
> + if (fe_val == FE_TONEAREST) {
> + t = xx.x * xx.x;
> + yy.y = xx.x + (t * (half + xx.x * t2) +
> + (t * t) * (t3 + xx.x * t4 + t * t5));
> + retval = one + yy.y;
> + } else {
> + __fesetround(FE_TONEAREST);
> + t = xx.x * xx.x;
> + yy.y = xx.x + (t * (half + xx.x * t2) +
> + (t * t) * (t3 + xx.x * t4 + t * t5));
> + retval = one + yy.y;
> + __fesetround(fe_val);
The formatting here is off, missing space before '(' in function calls.
And you should be using the optimized SET_RESTORE_ROUND (FE_TONEAREST),
which in addition to avoiding setting the rounding mode to a value it
already has, also e.g. arranges on x86_64 to set only the SSE rounding
mode and not the x87 mode because only the SSE mode needs setting for
types other than long double.
(I'm not clear why you actually need to set the rounding mode here. Does
excessive inaccuracy result in this particular case if you don't set it?)
> + fe_val = __fegetround();
Likewise again, and subsequently.
> + if (fe_val == FE_TONEAREST) {
> + z = xx.x - TBL2[j];
> + t = z * z;
> + /* the "small" term below guarantees inexact will be raised */
Guaranteeing "inexact" is not part of the goals for most libm functions,
so I expect you can remove that term.
> + if (-xx.x > threshold2)
> + { /* set underflow error condition */
> + double force_underflow = tiny * tiny;
> + math_force_eval (force_underflow);
> + retval = zero;
> + return retval;
I'd expect the force_underflow value to be returned in this case (so the
return value is appropriate in round-upward mode).
--
Joseph S. Myers
joseph@codesourcery.com