From: Brian Dessent <firstname.lastname@example.org>
Subject: Re: possible compiler optimization error
Date: Thu, 28 Jun 2007 12:01:31 -0700
"Frederich, Eric P21322" wrote:
> I do realize that they may in fact differ way out there beyond 15
> decimal places.
> What I don't understand is how two numbers pass a ==, then fail a >=,
> then pass a >= unless (after compiler optimizations) the second and
> third comparisons are actually comparing copies of these numbers which
> aren't "bit-exact" copies.
> Is this what you're saying might be happening and what -ffloat-store is
> supposed to resolve?
> If so, that makes sense and I can accept that.
I think Dave already explained it but in case it's not clear, on the
i387, all floating point math happens at 80 bit registers, even if the
underlying values are actually 32 bit (float) or 64 bit (double)
quantities. This means there can be extra bits of precision in the
register if the value has not been written to memory yet. -ffloat-store
is kind of a hacky workaround to this problem that tells the compiler to
try harder to write values to memory and read them back in whenever
possible. It's not a guaranteed fix, and it has a negative performance
The real problem is not in the compiler, it's the crappy design of the
i387. The best workaround is not to use the 387 unit at all if
possible. This is what -mfpmath=sse does, as the sse unit was designed
much more sanely so that it doesn't have this excess precision problem.
Note that sse only has support for 32 bit floating point types, you need
sse2 for 64 bit double types. And -march=i686 does not enable sse2
because not all i686 class machines have sse2. So that is why I said
"if you have a sse2 machine and set -march appropriately", meaning e.g.
-march=pentium4 or -march=k8. That is why using "-march=i686" or
"-march=i686 -msse" both fail, because neither imply sse2.
Using "-march=i686 -msse2" doesn't make a lot of sense to me, because it
generates code that will cause invalid instruction faults on i686
machines without sse2 (e.g. ppro, celeron, pentium3, k7/athlon.) By
giving -msse2 you're already limiting the architecture to pentium4/k8
anyway, so you might as well just use the correct -march.
This is all thankfully moot on x86_64, because there the 387 is
obsoleted and essentially disabled entirely.
Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple
Problem reports: http://cygwin.com/problems.html