This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: Fix x86 sqrt rounding (bug 14032)
- From: Rich Felker <dalias at aerifal dot cx>
- To: "Joseph S. Myers" <joseph at codesourcery dot com>
- Cc: Richard Henderson <rth at twiddle dot net>, libc-alpha at sourceware dot org
- Date: Wed, 27 Nov 2013 20:13:16 -0500
- Subject: Re: Fix x86 sqrt rounding (bug 14032)
- Authentication-results: sourceware.org; auth=none
- References: <Pine dot LNX dot 4 dot 64 dot 1311271803540 dot 7837 at digraph dot polyomino dot org dot uk> <52966555 dot 20603 at twiddle dot net> <20131127232338 dot GP24286 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1311280049090 dot 5433 at digraph dot polyomino dot org dot uk>
On Thu, Nov 28, 2013 at 12:56:10AM +0000, Joseph S. Myers wrote:
> On Wed, 27 Nov 2013, Rich Felker wrote:
>
> > Setting the rounding precision is likely to be much slower, and has
> > issues with signals (signal handlers could be invoked with wrong
> > rounding precision, resulting in completely wrong results). Also, in
>
> I've no idea about speed (we don't currently have sqrt benchmarks checked
> in), but setting precision is certainly simpler, and we have various
> existing functions that do so in order to work correctly (IBM libm
> functions that rely on 53-bit precision for their internal operations).
> The signal handlers issue is as discussed lately for AS-safety: as per
> C11, you shouldn't rely on any particular floating-point environment in a
> signal handler and so should do fegetenv (&old_env) / fesetenv
> (FE_DFL_ENV) / (rest of signal handler) / fesetenv (&old_env) if doing
> floating point in a signal handler. Failure of FE_DFL_ENV to set rounding
> precision to extended is (part of) bug 16068.
Well then pending resolution of bug 16068, this would be something of
a regression. It's also unfortunate that POSIX does not define the
fenv functions as AS-safe, so a conforming POSIX program *cannot* do
what C11 recommends.
Anyway, I think the proper next step is comparing performance. My
intuition is that changing the control register is going to be a lot
slower than the typical path in the first patch proposed, which
essentially adds just an ld80 store and double store/load pair. Note
that a benchmark should not use the testcase values (which are
numerically rare and intentionally chosen to hit the double-rounding
issue) unless the intent is to optimize worst-case rather than
avergage runtime.
Rich