Bug 13932 - dbl-64 pow unexpectedly slow for some inputs
Summary: dbl-64 pow unexpectedly slow for some inputs
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: math (show other bugs)
Version: unspecified
: P2 enhancement
Target Milestone: 2.28
Assignee: Siddhesh Poyarekar
URL:
Keywords:
: 16898 (view as bug list)
Depends on:
Blocks:
 
Reported: 2012-03-30 21:28 UTC by Paul Pluzhnikov
Modified: 2020-05-25 13:19 UTC (History)
9 users (show)

See Also:
Host: x86_64-linux-gnu
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
test case (687 bytes, text/x-csrc)
2012-03-30 21:28 UTC, Paul Pluzhnikov
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Paul Pluzhnikov 2012-03-30 21:28:42 UTC
Created attachment 6310 [details]
test case

Attached test case shows that pow is over 10,000 slower for some inputs
than for other (nearby) inputs.

This was first noticed with eglibc-2.11, and reproduced with current
git trunk.

Results:

32-bit build:

./elf/ld.so --library-path .:math:dlfcn ./a.out
pow(0.562500, 1.5): 13.644814 calls/usec
pow(0.563250, 1.5): 13.647048 calls/usec
ratio:      1

64-bit build:

./elf/ld.so --library-path .:math:dlfcn ./a.out
pow(0.562500, 1.5): 0.000982 calls/usec
pow(0.563250, 1.5): 13.574397 calls/usec
ratio:  13828


perf shows:

# Events: 15K cycles
#
# Overhead  Command      Shared Object                               Symbol
# ........  .......  .................  ...................................
#
    63.31%    ld.so  libm.so            [.] __mul
    18.44%    ld.so  libm.so            [.] __ieee754_pow_sse2
    13.81%    ld.so  libm.so            [.] __exp1
     1.20%    ld.so  libm.so            [.] __pow
     1.09%    ld.so  libm.so            [.] sub_magnitudes
     0.43%    ld.so  a.out              [.] main
     0.38%    ld.so  libm.so            [.] __cpy
     0.21%    ld.so  libm.so            [.] add_magnitudes
     0.19%    ld.so  libm.so            [.] @plt
Comment 1 Manuel López-Ibáñez 2012-12-13 16:44:21 UTC
I can still reproduce it with 2.12

GNU C Library stable release version 2.12, by Roland McGrath et al.
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.4.6 20120305 (Red Hat 4.4.6-4).
Compiled on a Linux 2.6.32 system on 2012-07-18.
Available extensions:
        The C stubs add-on version 2.1.2.
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
        RT using linux kernel aio
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

A testcase:

#include <math.h>
#ifndef POW
#define POW pow
#endif
#define N 1500
int total[N][N];
double alpha = 2.81;
int n = N;

int main(int argc, char *argv[])
{
    int x = 30976397;
    x += 0;
    double trail = 1. / ( (double) N * (double) x) ;
    long i,j;
    for ( i = 0 ; i < N ; i++ ) {
        for ( j = 0 ; j < i ; j++ ) {
            total[i][j] = POW(trail, alpha);
        }
    }
    return 0;
}

$ gcc -o slowpow slowpow.c -lm
$ /usr/bin/time ./slowpow
213.75user 0.04system 3:34.27elapsed 99%CPU (0avgtext+0avgdata 12784maxresident)k
0inputs+0outputs (0major+838minor)pagefaults 0swaps

(Interrupted, it was running forever)

$ gcc -DPOW=powl  -o slowpow slowpow.c -lm
$ /usr/bin/time ./slowpow 
0.20user 0.01system 0:00.22elapsed 98%CPU (0avgtext+0avgdata 35040maxresident)k
0inputs+0outputs (0major+2229minor)pagefaults 0swaps

$ gcc -DPOW=powf  -o slowpow slowpow.c -lm
$ /usr/bin/time ./slowpow 
0.23user 0.01system 0:00.25elapsed 95%CPU (0avgtext+0avgdata 35088maxresident)k
160inputs+0outputs (1major+2231minor)pagefaults 0swaps


At least, it would be nice if the man page mentioned this issue and what kind of inputs may lead to it. It causes a lot of frustration: http://entropymine.com/imageworsener/slowpow/
Comment 2 Paul Pluzhnikov 2012-12-13 18:57:52 UTC
(In reply to comment #1)
> I can still reproduce it with 2.12

2.12 is ancient history. Current git trunk:

# i686:
Could not get the test to work. Something appears to be broken with my toolchain, or with trunk in i686 mode.

# x86_64:
./elf/ld.so --library-path .:math:dlfcn /tmp/test_pow64
pow(0.562500, 1.5): 0.000990 calls/usec
pow(0.563250, 1.5): 12.753271 calls/usec
ratio:  12887
Comment 3 Siddhesh Poyarekar 2013-01-04 11:33:13 UTC
Assigning this to me since I'm working on cleaning up the mp code, which should make this better.
Comment 4 Manuel López-Ibáñez 2013-03-08 14:07:29 UTC
(In reply to comment #3)
> Assigning this to me since I'm working on cleaning up the mp code, which should
> make this better.

Thanks for working on this. As I said above, it would be nice if for the time being, a note was added to the man page of pow warning about this issue. If you happen to know what kind of inputs trigger it, that would be interesting information.
Comment 5 Siddhesh Poyarekar 2013-03-11 06:29:10 UTC
If we did, we wouldn't have had to do all those iterations at different precision levels and the result would have been much faster :)

Nevertheless, I've added patches to master that ought to improve performance by at least 4x in the worst case, bringing it close to mpfr performance (still about 2x slower though).  There's still scope to make it faster, which is why I haven't closed this bug yet.

The man pages are not part of glibc, they're a separate project by themselves.  There could be a case for adding a note explaining the multiprecision fallback (why it's there, overview, impact, etc.) in the glibc manual though.
Comment 6 Siddhesh Poyarekar 2013-03-11 06:35:44 UTC
Created bug 15267 to track the documentation fix.
Comment 7 Manuel López-Ibáñez 2013-03-11 09:07:17 UTC
(In reply to comment #5)
> Nevertheless, I've added patches to master that ought to improve performance by
> at least 4x in the worst case, bringing it close to mpfr performance (still
> about 2x slower though).  There's still scope to make it faster, which is why I
> haven't closed this bug yet.


Hi, thanks for your work. Personally, I would be interested to have a pow version that is perhaps less precise but that does not have  such nasty corner-cases. A 4x improvement is nice, but given that the worst-case may be 12,000x slower, it is not a definitive solution.

There are some implementations on the net that perhaps could be used when -ffast-math is given?

http://jrfonseca.blogspot.be/2008/09/fast-sse2-pow-tables-or-polynomials.html

http://www.dctsystems.co.uk/Software/power.html

http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/

I am no expert in IEEE floating-point arithmetic, so it may well be that those implementations produce extremely bad approximations for some inputs. Is that the case?
Comment 8 Siddhesh Poyarekar 2013-03-11 10:19:19 UTC
(In reply to comment #7)
> Hi, thanks for your work. Personally, I would be interested to have a pow
> version that is perhaps less precise but that does not have  such nasty
> corner-cases. A 4x improvement is nice, but given that the worst-case may be
> 12,000x slower, it is not a definitive solution.
> 
> There are some implementations on the net that perhaps could be used when
> -ffast-math is given?
> 
> http://jrfonseca.blogspot.be/2008/09/fast-sse2-pow-tables-or-polynomials.html
> 
> http://www.dctsystems.co.uk/Software/power.html
> 
> http://martin.ankerl.com/2012/01/25/optimized-approximative-pow-in-c-and-cpp/
> 
> I am no expert in IEEE floating-point arithmetic, so it may well be that those
> implementations produce extremely bad approximations for some inputs. Is that
> the case?

They're all in a way subsets of what the fast phase of libm does, so we don't really need to reimplement libm to take advantage of those ideas.

That said, there is indeed a strong case for fast and (for some inputs) imprecise computations of transcendentals.  That probably won't be the default in glibc, but I won't rule out the possibility of having an alternate library or configuration, provided there is consensus in the glibc community for it.
Comment 9 Joseph Myers 2014-05-07 15:42:04 UTC
*** Bug 16898 has been marked as a duplicate of this bug. ***
Comment 10 Vincent Lefèvre 2015-04-16 14:00:18 UTC
(In reply to Paul Pluzhnikov from comment #2)
> # x86_64:
> ./elf/ld.so --library-path .:math:dlfcn /tmp/test_pow64
> pow(0.562500, 1.5): 0.000990 calls/usec
> pow(0.563250, 1.5): 12.753271 calls/usec
> ratio:  12887

Note that mathematically x^1.5 = sqrt(x)^3.

Now, 0.562500 is a perfect square:

? sqrt(0.562500)
%1 = 0.75000000000000000000000000000000000000

This is probably related to the slowness. Unless the implementation tries to provide the correct inexact flag (which I doubt), I don't see a theoretical reason for which it would be slow in rounding to nearest. It would be very different in a directed rounding mode since this is an exact case. Still, this could be a corner case for an algorithm that doesn't expect exact results or something like that.
Comment 11 Sourceware Commits 2018-02-12 10:48:33 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  c3d466cba1692708a19c6ff829d0386c83a0c6e5 (commit)
      from  7bb087bd7bfe3616c4c0974a3f7352b593353ea5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=c3d466cba1692708a19c6ff829d0386c83a0c6e5

commit c3d466cba1692708a19c6ff829d0386c83a0c6e5
Author: Wilco Dijkstra <wdijkstr@arm.com>
Date:   Mon Feb 12 10:42:42 2018 +0000

    Remove slow paths from pow
    
    Remove the slow paths from pow.  Like several other double precision math
    functions, pow is exactly rounded.  This is not required from math functions
    and causes major overheads as it requires multiple fallbacks using higher
    precision arithmetic if a result is close to 0.5ULP.  Ridiculous slowdowns
    of up to 100000x have been reported when the highest precision path triggers.
    
    All GLIBC math tests pass on AArch64 and x64 (with ULP of pow set to 1).
    The worst case error is ~0.506ULP.  A simple test over a few hundred million
    values shows pow is 10% faster on average.  This fixes BZ #13932.
    
    	[BZ #13932]
    	* sysdeps/ieee754/dbl-64/uexp.h (err_1): Remove.
    	* benchtests/pow-inputs: Update comment for slow path cases.
    	* manual/probes.texi (slowpow_p10): Delete removed probe.
    	(slowpow_p10): Likewise.
    	* math/Makefile: Remove halfulp.c and slowpow.c.
    	* sysdeps/aarch64/libm-test-ulps: Set ULP of pow to 1.
    	* sysdeps/generic/math_private.h (__exp1): Remove error argument.
    	(__halfulp): Remove.
    	(__slowpow): Remove.
    	* sysdeps/i386/fpu/halfulp.c: Delete file.
    	* sysdeps/i386/fpu/slowpow.c: Likewise.
    	* sysdeps/ia64/fpu/halfulp.c: Likewise.
    	* sysdeps/ia64/fpu/slowpow.c: Likewise.
    	* sysdeps/ieee754/dbl-64/e_exp.c (__exp1): Remove error argument,
    	improve comments and add error analysis.
    	* sysdeps/ieee754/dbl-64/e_pow.c (__ieee754_pow): Add error analysis.
    	(power1): Remove function:
    	(log1): Remove error argument, add error analysis.
    	(my_log2): Remove function.
    	* sysdeps/ieee754/dbl-64/halfulp.c: Delete file.
    	* sysdeps/ieee754/dbl-64/slowpow.c: Likewise.
    	* sysdeps/m68k/m680x0/fpu/halfulp.c: Likewise.
    	* sysdeps/m68k/m680x0/fpu/slowpow.c: Likewise.
    	* sysdeps/powerpc/power4/fpu/Makefile: Remove CPPFLAGS-slowpow.c.
    	* sysdeps/x86_64/fpu/libm-test-ulps: Set ULP of pow to 1.
    	* sysdeps/x86_64/fpu/multiarch/Makefile: Remove slowpow-fma.c,
    	slowpow-fma4.c, halfulp-fma.c, halfulp-fma4.c.
    	* sysdeps/x86_64/fpu/multiarch/e_pow-fma.c (__slowpow): Remove define.
    	* sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c (__slowpow): Likewise.
    	* sysdeps/x86_64/fpu/multiarch/halfulp-fma.c: Delete file.
    	* sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/slowpow-fma.c: Likewise.
    	* sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c: Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                   |   37 ++++++
 benchtests/pow-inputs                       |    6 +-
 manual/probes.texi                          |   16 ---
 math/Makefile                               |    4 +-
 sysdeps/aarch64/libm-test-ulps              |    2 +
 sysdeps/generic/math_private.h              |    4 +-
 sysdeps/i386/fpu/halfulp.c                  |    1 -
 sysdeps/i386/fpu/slowpow.c                  |    1 -
 sysdeps/ia64/fpu/halfulp.c                  |    1 -
 sysdeps/ia64/fpu/slowpow.c                  |    1 -
 sysdeps/ieee754/dbl-64/e_exp.c              |   42 +++----
 sysdeps/ieee754/dbl-64/e_pow.c              |  172 ++++-----------------------
 sysdeps/ieee754/dbl-64/halfulp.c            |  152 -----------------------
 sysdeps/ieee754/dbl-64/slowpow.c            |  125 -------------------
 sysdeps/ieee754/dbl-64/uexp.h               |    2 +-
 sysdeps/m68k/m680x0/fpu/halfulp.c           |    1 -
 sysdeps/m68k/m680x0/fpu/slowpow.c           |    1 -
 sysdeps/powerpc/power4/fpu/Makefile         |    1 -
 sysdeps/x86_64/fpu/libm-test-ulps           |    2 +
 sysdeps/x86_64/fpu/multiarch/Makefile       |   12 +--
 sysdeps/x86_64/fpu/multiarch/e_pow-fma.c    |    1 -
 sysdeps/x86_64/fpu/multiarch/e_pow-fma4.c   |    1 -
 sysdeps/x86_64/fpu/multiarch/halfulp-fma.c  |    4 -
 sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c |    4 -
 sysdeps/x86_64/fpu/multiarch/slowpow-fma.c  |   11 --
 sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c |   11 --
 26 files changed, 90 insertions(+), 525 deletions(-)
 delete mode 100644 sysdeps/i386/fpu/halfulp.c
 delete mode 100644 sysdeps/i386/fpu/slowpow.c
 delete mode 100644 sysdeps/ia64/fpu/halfulp.c
 delete mode 100644 sysdeps/ia64/fpu/slowpow.c
 delete mode 100644 sysdeps/ieee754/dbl-64/halfulp.c
 delete mode 100644 sysdeps/ieee754/dbl-64/slowpow.c
 delete mode 100644 sysdeps/m68k/m680x0/fpu/halfulp.c
 delete mode 100644 sysdeps/m68k/m680x0/fpu/slowpow.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/halfulp-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/halfulp-fma4.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowpow-fma.c
 delete mode 100644 sysdeps/x86_64/fpu/multiarch/slowpow-fma4.c
Comment 12 Vincent Lefèvre 2018-02-12 11:29:21 UTC
(In reply to cvs-commit@gcc.gnu.org from comment #11)
>     Remove slow paths from pow

Instead of removing the slow paths, wouldn't have been better to fix the bug (see comment #10, where the slowness cannot be explained by the correct rounding, since this is an easy case), then decide?
Comment 13 Vincent Lefèvre 2018-02-12 12:08:47 UTC
From the bug report, it is __ieee754_pow_sse2 that was used, and if I understand correctly, the code is in "sysdeps/x86_64/fpu/multiarch/e_pow.c", which uses "sysdeps/ieee754/dbl-64/e_pow.c". The slow path would imply that __exp1 returns a non-positive value. __exp1 is defined in "sysdeps/ieee754/dbl-64/e_exp.c". There, the rounding test is:

      if (res == (res + cor * (1.0 + error + err_1)))

For the example of slow case pow(0.562500, 1.5), the exact result is 0.421875 = 27/64, so that cor should be much smaller than ulp(res) and a positive value should be returned, which doesn't explain the behavior.
Comment 14 Vincent Lefèvre 2018-02-12 12:22:11 UTC
Moreover, the code is quite horrible. mydefs.h defines:

typedef int int4;
typedef union { int4 i[2]; double x; } mynumber;

and in e_exp.c, it does bitwise operations on these signed int:

        binexp.i[HIGH_HALF] = (junk1.i[LOW_HALF] + 1023) << 20;

        i = ((junk2.i[LOW_HALF] >> 8) & 0xfffffffe) + 356;
        j = (junk2.i[LOW_HALF] & 511) << 1;

which yield undefined or implementation-defined behavior. So I wonder whether the generated code does what is expected.
Comment 15 Wilco 2018-02-12 12:28:14 UTC
(In reply to Vincent Lefèvre from comment #12)
> (In reply to cvs-commit@gcc.gnu.org from comment #11)
> >     Remove slow paths from pow
> 
> Instead of removing the slow paths, wouldn't have been better to fix the bug
> (see comment #10, where the slowness cannot be explained by the correct
> rounding, since this is an easy case), then decide?

Removing buggy code is far better than fixing it...

(In reply to Vincent Lefèvre from comment #13)
> From the bug report, it is __ieee754_pow_sse2 that was used, and if I
> understand correctly, the code is in "sysdeps/x86_64/fpu/multiarch/e_pow.c",
> which uses "sysdeps/ieee754/dbl-64/e_pow.c". The slow path would imply that
> __exp1 returns a non-positive value. __exp1 is defined in
> "sysdeps/ieee754/dbl-64/e_exp.c". There, the rounding test is:
> 
>       if (res == (res + cor * (1.0 + error + err_1)))
> 
> For the example of slow case pow(0.562500, 1.5), the exact result is
> 0.421875 = 27/64, so that cor should be much smaller than ulp(res) and a
> positive value should be returned, which doesn't explain the behavior.

Whenever you're close to 0.5 ULP, cor will be just below 0.5ULP, eg. 0.4999999999, so any value of error + err_1 that is larger than 2^-53ULP will cause it to round up, thus falling into the slow path.
Comment 16 Vincent Lefèvre 2018-02-12 12:43:39 UTC
(In reply to Wilco from comment #15)
> Removing buggy code is far better than fixing it...

The buggy code is in the fast path. You didn't remove it.

> (In reply to Vincent Lefèvre from comment #13)
> > For the example of slow case pow(0.562500, 1.5), the exact result is
> > 0.421875 = 27/64, so that cor should be much smaller than ulp(res) and a
> > positive value should be returned, which doesn't explain the behavior.
> 
> Whenever you're close to 0.5 ULP,

It is not close to 0.5 ulp. This is an *exact* case (27/64), so that the correction term should be very small, much smaller than 0.5 ulp.

The correction term is close to 0.5 ulp (possibly above due to approximation errors) when the exact result is a midpoint or close to a midpoint (cases where the Table Maker's Dilemma occurs).
Comment 17 Vincent Lefèvre 2018-02-12 12:57:05 UTC
(In reply to Vincent Lefèvre from comment #16)
> > (In reply to Vincent Lefèvre from comment #13)
> > > For the example of slow case pow(0.562500, 1.5), the exact result is
> > > 0.421875 = 27/64, so that cor should be much smaller than ulp(res) and a
> > > positive value should be returned, which doesn't explain the behavior.
> > 
> > Whenever you're close to 0.5 ULP,
> 
> It is not close to 0.5 ulp. This is an *exact* case (27/64), so that the
> correction term should be very small, much smaller than 0.5 ulp.

If it were close to 0.5 ulp, it would mean that there would be around 53 bits of accuracy, which does not match your error analysis: "Maximum relative error before rounding is 8.8e-22 (69.9 bits)."

So how can you explain the switch to the slow path with the previous code on this exact case with the 69.9 bits of accuracy for (al,rem)?
Comment 18 Wilco 2018-02-12 13:10:23 UTC
(In reply to Vincent Lefèvre from comment #17)
> (In reply to Vincent Lefèvre from comment #16)
> > > (In reply to Vincent Lefèvre from comment #13)
> > > > For the example of slow case pow(0.562500, 1.5), the exact result is
> > > > 0.421875 = 27/64, so that cor should be much smaller than ulp(res) and a
> > > > positive value should be returned, which doesn't explain the behavior.
> > > 
> > > Whenever you're close to 0.5 ULP,
> > 
> > It is not close to 0.5 ulp. This is an *exact* case (27/64), so that the
> > correction term should be very small, much smaller than 0.5 ulp.
> 
> If it were close to 0.5 ulp, it would mean that there would be around 53
> bits of accuracy, which does not match your error analysis: "Maximum
> relative error before rounding is 8.8e-22 (69.9 bits)."
> 
> So how can you explain the switch to the slow path with the previous code on
> this exact case with the 69.9 bits of accuracy for (al,rem)?

I don't think pow(0.562500, 1.5) was ever slow. The case I tested as slow was pow(0.999999999999999889, 1.5).

Either way the calculation of the error term was incorrect, for example in pow.c there was:

	error = error * fabs (y);
	t = __exp1 (a1, a2, 1.9e16 * error);	/* return -10 or 0 if wasn't computed exactly */

The error for pow scales with log(x) * y which is bounded by 710.
Comment 19 Vincent Lefèvre 2018-02-12 14:14:02 UTC
(In reply to Wilco from comment #18)
> I don't think pow(0.562500, 1.5) was ever slow.

That's from the bug report:

------------------------------------------------------------
./elf/ld.so --library-path .:math:dlfcn ./a.out
pow(0.562500, 1.5): 0.000982 calls/usec
pow(0.563250, 1.5): 13.574397 calls/usec
ratio:  13828
------------------------------------------------------------

But indeed I cannot reproduce this slowness with 0.562500 on glibc 2.26.

> The case I tested as slow was pow(0.999999999999999889, 1.5).

Indeed this one is very slow (mainly due to the use of 768 bits in the slow path, which is much more than actually necessary).

> Either way the calculation of the error term was incorrect,

I hadn't checked that.

I suppose that a full rewrite would be needed for a crpow() function. This could be useful: https://hal.inria.fr/inria-00583988
Comment 20 Vincent Lefèvre 2018-02-12 14:29:29 UTC
(In reply to Vincent Lefèvre from comment #19)
> (In reply to Wilco from comment #18)
> > I don't think pow(0.562500, 1.5) was ever slow.
> 
> That's from the bug report:
> 
> ------------------------------------------------------------
> ./elf/ld.so --library-path .:math:dlfcn ./a.out
> pow(0.562500, 1.5): 0.000982 calls/usec
> pow(0.563250, 1.5): 13.574397 calls/usec
> ratio:  13828
> ------------------------------------------------------------

The bug report was inaccurate. By outputting values with more precision:

pow(0.56249999999999978, 1.5): 0.004758 calls/usec
pow(0.5632499999999997, 1.5): 10.506687 calls/usec

The exact value for 0.56249999999999978 is 0.5624999999999997779553950749686919152736663818359375, and the significand of its power to 1.5 is: 7599824371187707.50000000000000044... This is a hard-to-round case, which eventually explains the slow path.
Comment 21 jsm-csl@polyomino.org.uk 2018-02-12 18:02:03 UTC
If the commit fixes the bug, it should be resolved as FIXED with target 
milestone set.
Comment 22 Wilco 2018-02-12 18:54:57 UTC
(In reply to joseph@codesourcery.com from comment #21)
> If the commit fixes the bug, it should be resolved as FIXED with target 
> milestone set.

Hmm, I don't seem to be able to make any changes from this account...
Comment 23 jsm-csl@polyomino.org.uk 2018-02-12 20:38:25 UTC
If you use an @gcc.gnu.org / @sourceware.org account you should 
automatically have editbugs permission.
Comment 24 Paul Pluzhnikov 2018-02-12 23:18:14 UTC
Fixed by c3d466cba1692708a19c6ff829d0386c83a0c6e5
Comment 25 Michael Kerrisk 2020-05-23 19:32:29 UTC
Fix documented for man-pages-5.07.

diff --git a/man3/pow.3 b/man3/pow.3
index 2f8b340bb..91b9a9562 100644
--- a/man3/pow.3
+++ b/man3/pow.3
@@ -329,9 +329,10 @@ The variant returning
 also conforms to
 SVr4, 4.3BSD, C89.
 .SH BUGS
-On 64-bits,
+Before glibc 2.28,
 .\"
 .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13932
+on some architectures (e.g., x86-64)
 .BR pow ()
 may be more than 10,000 times slower for some (rare) inputs
 than for other nearby inputs.
@@ -341,6 +342,9 @@ and not
 .BR powf ()
 nor
 .BR powl ().
+This problem was fixed
+.\" commit c3d466cba1692708a19c6ff829d0386c83a0c6e5
+in glibc 2.28.
 .PP
 In glibc 2.9 and earlier,
 .\"
Comment 26 Vincent Lefèvre 2020-05-23 21:31:52 UTC
(In reply to Michael Kerrisk from comment #25)
> Fix documented for man-pages-5.07.
[...]
> -On 64-bits,
> +Before glibc 2.28,
>  .\"
>  .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13932
> +on some architectures (e.g., x86-64)
>  .BR pow ()
>  may be more than 10,000 times slower for some (rare) inputs
>  than for other nearby inputs.
[...]

The problematic values are uncommon, but not so rare, in the sense that they are close to simple values, i.e. are likely to occur in practice. An example given above: pow(0.999999999999999889, 1.5)

1 and 1.5 are very simple values, which are more likely to occur in practice than some fixed random value. Then it suffices to have a small rounding error on 1...

For instance, this is very different from hard-to-round cases of exp, which are also very slow IMHO, but unless one writes a specific program for them, no-one should notice the slowness because such a case would typically occur only once among billions (I don't remember the accuracy before the slowest path in this library).
Comment 27 Michael Kerrisk 2020-05-25 11:38:11 UTC
(In reply to Vincent Lefèvre from comment #26)
> (In reply to Michael Kerrisk from comment #25)
> > Fix documented for man-pages-5.07.
> [...]
> > -On 64-bits,
> > +Before glibc 2.28,
> >  .\"
> >  .\" https://sourceware.org/bugzilla/show_bug.cgi?id=13932
> > +on some architectures (e.g., x86-64)
> >  .BR pow ()
> >  may be more than 10,000 times slower for some (rare) inputs
> >  than for other nearby inputs.
> [...]
> 
> The problematic values are uncommon, but not so rare, in the sense that they
> are close to simple values, i.e. are likely to occur in practice. An example
> given above: pow(0.999999999999999889, 1.5)
> 
> 1 and 1.5 are very simple values, which are more likely to occur in practice
> than some fixed random value. Then it suffices to have a small rounding
> error on 1...
> 
> For instance, this is very different from hard-to-round cases of exp, which
> are also very slow IMHO, but unless one writes a specific program for them,
> no-one should notice the slowness because such a case would typically occur
> only once among billions (I don't remember the accuracy before the slowest
> path in this library).

Vincent

Thanks for your reply.

So, I should just remove "(rare)", do you think?

Thanks,

Michael
Comment 28 Vincent Lefèvre 2020-05-25 12:08:24 UTC
(In reply to Michael Kerrisk from comment #27)
> So, I should just remove "(rare)", do you think?

Yes, I think that "rare" can be misleading. The sentence "[...] more than 10,000 times slower for some inputs than for other nearby inputs." would be OK, as "some" implies that these are particular inputs, which may affect some programs. This is exactly what we had before glibc 2.28.
Comment 29 Michael Kerrisk 2020-05-25 13:19:18 UTC
(In reply to Vincent Lefèvre from comment #28)
> (In reply to Michael Kerrisk from comment #27)
> > So, I should just remove "(rare)", do you think?
> 
> Yes, I think that "rare" can be misleading. 

Okay. Done. Thanks for your input!