This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v2] improve cexp performance for imaginary inputs
- From: "Carlos O'Donell" <carlos at redhat dot com>
- To: Julian Taylor <jtaylor dot debian at googlemail dot com>, libc-alpha at sourceware dot org
- Date: Thu, 05 Mar 2015 15:28:41 -0500
- Subject: Re: [PATCH v2] improve cexp performance for imaginary inputs
- Authentication-results: sourceware.org; auth=none
- References: <1425500809-5569-1-git-send-email-jtaylor dot debian at googlemail dot com>
On 03/04/2015 03:26 PM, Julian Taylor wrote:
> cexp(x) can avoid the exponential from the exp(x_r) * (cos(x_i) + i sin(x_i))
> when the real part of the input is zero.
> This is a common enough input to be worth optimizing, e.g. twiddle factors in
> fast fourier transforms.
>
> Even though the exp function has a fast path for the zero input case it
> does still impose about 15% overhead on the computation of cexp on amd64.
> For 10000 uniform imaginary inputs from -pi to pi performance improved
> from 350 cycles to 300 cycles per call on an amd phenom 2X4.
>
> In the case of real and imaginary input it just adds one branch to the
> already existing 6 branches and sincos computation so it has no
> measurable negative effect on amd64 and likely also most other cpus.
What benchmark did you use?
Could you please consider adding a microbenchmark to glibc/benchtests
that exercises cexp and shows a performance gain? That way the community
can maintain the microbenchmark and prevent future maintainers from
removing your changes as a premature optimization (or can conditionalize
them if it proves a bad choice for other compilers or machine architectures).
Cheers,
Carlos.
> ---
> math/s_cexp.c | 6 +++++-
> math/s_cexpf.c | 6 +++++-
> math/s_cexpl.c | 6 +++++-
> 3 files changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/math/s_cexp.c b/math/s_cexp.c
> index 9116e2b..2472708 100644
> --- a/math/s_cexp.c
> +++ b/math/s_cexp.c
> @@ -70,7 +70,11 @@ __cexp (__complex__ double x)
> }
> else
> {
> - double exp_val = __ieee754_exp (__real__ x);
> + double exp_val = 1.;
> + if (__real__ x != 0.)
> + {
> + exp_val= __ieee754_exp (__real__ x);
> + }
> __real__ retval = exp_val * cosix;
> __imag__ retval = exp_val * sinix;
> }
> diff --git a/math/s_cexpf.c b/math/s_cexpf.c
> index fac1a17..84b7b56 100644
> --- a/math/s_cexpf.c
> +++ b/math/s_cexpf.c
> @@ -70,7 +70,11 @@ __cexpf (__complex__ float x)
> }
> else
> {
> - float exp_val = __ieee754_expf (__real__ x);
> + float exp_val = 1.;
> + if (__real__ x != 0.)
> + {
> + exp_val= __ieee754_expf (__real__ x);
> + }
> __real__ retval = exp_val * cosix;
> __imag__ retval = exp_val * sinix;
> }
> diff --git a/math/s_cexpl.c b/math/s_cexpl.c
> index 9309b1f..0e9d515 100644
> --- a/math/s_cexpl.c
> +++ b/math/s_cexpl.c
> @@ -70,7 +70,11 @@ __cexpl (__complex__ long double x)
> }
> else
> {
> - long double exp_val = __ieee754_expl (__real__ x);
> + long double exp_val = 1.;
> + if (__real__ x != 0.)
> + {
> + exp_val= __ieee754_expl (__real__ x);
> + }
> __real__ retval = exp_val * cosix;
> __imag__ retval = exp_val * sinix;
> }
>