This is the mail archive of the cygwin mailing list for the Cygwin project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: possible compiler optimization error

This doesn't have anything to do with cygwin but it can be an important point.
Some compilers or applications , I think Intel IIRC, can figure out which processor you have
at run time and pick which code to run- obviously the exe size gets large but
if you need speed it can be helpful. I've thrown in assembly code that needs
certain fpu's and its great as long as you have a fall back or diagnostics and don't
fail with an unexplained invalid instruction of "core dump."

Close only counts in horseshoes, handgrenades, and floating point.

From: Brian Dessent <>
Subject: Re: possible compiler optimization error
Date: Thu, 28 Jun 2007 12:01:31 -0700

"Frederich, Eric P21322" wrote:

> I do realize that they may in fact differ way out there beyond 15
> decimal places.
> What I don't understand is how two numbers pass a ==, then fail a >=,
> then pass a >= unless (after compiler optimizations) the second and
> third comparisons are actually comparing copies of these numbers which
> aren't "bit-exact" copies.
> Is this what you're saying might be happening and what -ffloat-store is
> supposed to resolve?
> If so, that makes sense and I can accept that.

I think Dave already explained it but in case it's not clear, on the
i387, all floating point math happens at 80 bit registers, even if the
underlying values are actually 32 bit (float) or 64 bit (double)
quantities.  This means there can be extra bits of precision in the
register if the value has not been written to memory yet.  -ffloat-store
is kind of a hacky workaround to this problem that tells the compiler to
try harder to write values to memory and read them back in whenever
possible.  It's not a guaranteed fix, and it has a negative performance

The real problem is not in the compiler, it's the crappy design of the
i387.  The best workaround is not to use the 387 unit at all if
possible.  This is what -mfpmath=sse does, as the sse unit was designed
much more sanely so that it doesn't have this excess precision problem.

Note that sse only has support for 32 bit floating point types, you need
sse2 for 64 bit double types.  And -march=i686 does not enable sse2
because not all i686 class machines have sse2.  So that is why I said
"if you have a sse2 machine and set -march appropriately", meaning e.g.
-march=pentium4 or -march=k8.  That is why using "-march=i686" or
"-march=i686 -msse" both fail, because neither imply sse2.

Using "-march=i686 -msse2" doesn't make a lot of sense to me, because it
generates code that will cause invalid instruction faults on i686
machines without sse2 (e.g.  ppro, celeron, pentium3, k7/athlon.)  By
giving -msse2 you're already limiting the architecture to pentium4/k8
anyway, so you might as well just use the correct -march.

This is all thankfully moot on x86_64, because there the 387 is
obsoleted and essentially disabled entirely.


Unsubscribe info:
Problem reports:

PC Magazine?s 2007 editors? choice for best Web mail?award-winning Windows Live Hotmail.

-- Unsubscribe info: Problem reports: Documentation: FAQ:

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]