This is the mail archive of the
mailing list for the glibc project.
Re: Fix x86 sqrt rounding (bug 14032)
- From: "Joseph S. Myers" <joseph at codesourcery dot com>
- To: Rich Felker <dalias at aerifal dot cx>
- Cc: Richard Henderson <rth at twiddle dot net>, <libc-alpha at sourceware dot org>
- Date: Thu, 28 Nov 2013 02:02:08 +0000
- Subject: Re: Fix x86 sqrt rounding (bug 14032)
- Authentication-results: sourceware.org; auth=none
- References: <Pine dot LNX dot 4 dot 64 dot 1311271803540 dot 7837 at digraph dot polyomino dot org dot uk> <52966555 dot 20603 at twiddle dot net> <20131127232338 dot GP24286 at brightrain dot aerifal dot cx> <Pine dot LNX dot 4 dot 64 dot 1311280049090 dot 5433 at digraph dot polyomino dot org dot uk> <20131128011316 dot GR24286 at brightrain dot aerifal dot cx>
On Wed, 27 Nov 2013, Rich Felker wrote:
> Well then pending resolution of bug 16068, this would be something of
> a regression. It's also unfortunate that POSIX does not define the
As I said, various functions already set precision temporarily; it's not
new that libm does this.
> fenv functions as AS-safe, so a conforming POSIX program *cannot* do
> what C11 recommends.
I.e., POSIX programs can't use floating point in signal handlers.
Naturally I think we should document an intent that the fenv functions are
AS-safe provided you restore the original environment before leaving the
> Anyway, I think the proper next step is comparing performance. My
> intuition is that changing the control register is going to be a lot
> slower than the typical path in the first patch proposed, which
> essentially adds just an ld80 store and double store/load pair. Note
> that a benchmark should not use the testcase values (which are
> numerically rare and intentionally chosen to hit the double-rounding
> issue) unless the intent is to optimize worst-case rather than
> avergage runtime.
With inputs from
<https://sourceware.org/ml/libc-alpha/2013-10/msg00382.html> and testing
on a Sandy Bridge Xeon:
sqrt(): ITERS:1.83965e+09: TOTAL:31880.5Mcy, MAX:111.358cy, MIN:8.524cy, 57704.6 calls/Mcy
First patch (adjustment using C1 bit):
sqrt(): ITERS:1.84168e+09: TOTAL:31880.6Mcy, MAX:142.333cy, MIN:8.871cy, 57768.1 calls/Mcy
Second patch (temporarily changing precision):
sqrt(): ITERS:1.84008e+09: TOTAL:31880.4Mcy, MAX:125.583cy, MIN:8.488cy, 57718.3 calls/Mcy
I interpret this as meaning there is no significant performance difference
between the approaches and no significant performance loss from these
Joseph S. Myers