This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Fix x86 sqrt rounding (bug 14032)

On Thu, Nov 28, 2013 at 02:02:08AM +0000, Joseph S. Myers wrote:
> On Wed, 27 Nov 2013, Rich Felker wrote:
> > Well then pending resolution of bug 16068, this would be something of
> > a regression. It's also unfortunate that POSIX does not define the
> As I said, various functions already set precision temporarily; it's not 
> new that libm does this.

I see. Then I agree, the issues are separate.

> > fenv functions as AS-safe, so a conforming POSIX program *cannot* do
> > what C11 recommends.
> I.e., POSIX programs can't use floating point in signal handlers.  

This may be correct, or it may be that such implementations that
clobber the fpu state are not valid in C99/POSIX unless the signal
handling implementation fixes the state before the signal handler is
invoked. I haven't read the relevant text of the standards well enough
to know which is the case, so apologies if you or somebody else
already knows and I'm just re-raising a question with a known answer.

> Naturally I think we should document an intent that the fenv functions are 
> AS-safe provided you restore the original environment before leaving the 
> handler.

Agreed. And I assume the next issue of POSIX will be aligned with C11
and will also document this.

> > Anyway, I think the proper next step is comparing performance. My
> > intuition is that changing the control register is going to be a lot
> > slower than the typical path in the first patch proposed, which
> > essentially adds just an ld80 store and double store/load pair. Note
> > that a benchmark should not use the testcase values (which are
> > numerically rare and intentionally chosen to hit the double-rounding
> > issue) unless the intent is to optimize worst-case rather than
> > avergage runtime.
> With inputs from 
> <> and testing 
> on a Sandy Bridge Xeon:
> Unmodified glibc:
> sqrt(): ITERS:1.83965e+09: TOTAL:31880.5Mcy, MAX:111.358cy, MIN:8.524cy, 57704.6 calls/Mcy
> First patch (adjustment using C1 bit):
> sqrt(): ITERS:1.84168e+09: TOTAL:31880.6Mcy, MAX:142.333cy, MIN:8.871cy, 57768.1 calls/Mcy
> Second patch (temporarily changing precision):
> sqrt(): ITERS:1.84008e+09: TOTAL:31880.4Mcy, MAX:125.583cy, MIN:8.488cy, 57718.3 calls/Mcy
> I interpret this as meaning there is no significant performance difference 
> between the approaches and no significant performance loss from these 
> changes.

My inclination is to agree, but the numbers seem a bit odd. In
particular, the calls/Mcy values seem inconsistent with the MAX/MIN
cycle counts. A lower cycle count should yield more calls/Mcy, not
fewer, no? Or is this just a measurement precision error?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]