Bug 3268

Summary: fma for all targets without hardware fma instructions is incorrect.
Product: glibc Reporter: Steven Munroe <sjmunroe>
Component: mathAssignee: Steven Munroe <sjmunroe>
Status: RESOLVED DUPLICATE    
Severity: normal CC: glibc-bugs, jakub, jsm-csl, nbowler, vincent-srcware, zimmerma+gcc
Priority: P2 Flags: fweimer: security-
Version: 2.4   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Bug Depends on: 2749, 13304    
Bug Blocks:    
Attachments: Initial patch for discussion.
Initial fma patch for PPC32 soft-fp for discussion
updated Ports patch for powerpc32 soft-fp FMA
Updated FMA soft-fp implementation
New fma_test to verify the intermediate multiply product has full percision.
Updated patch to ports to add fma for PPC32 soft-fp
Updated patch to add single and double fma to soft-fp
Updated fma_test to verify that the intermediate multiply product has full percision
Updated patch to ports to add fma for PPC32 soft-fp
Updated fma_test to verify that the intermediate multiply product has full percision
Updated ports soft-fp fma patch for PPC32
updated patch to add float/double fma to soft-fp
Updated fma_test for current CVS
Example demonstrating fma failure on x86_64

Description Steven Munroe 2006-09-26 21:22:34 UTC
The current ./math/s_fma[fl].c implementation simply implements:

double
__fma (double x, double y, double z)
{
  return (x * y) + z;
}

Which does mot meet the requirement of 754r

"fused multiply-add: The operation fma(x,y,z) computes (x ×y )+z as if with
unbounded range and precision, rounding only once to the destination format; see
subclause 5.1."

This implies (for example) that for double that the 106 bit result of the
multiply be passed unrounded to the add (53 bit) and then final result found to
53 bits.
Comment 1 Steven Munroe 2006-09-26 21:23:52 UTC
Created attachment 1326 [details]
Initial patch for discussion.
Comment 2 Steven Munroe 2006-09-26 21:24:50 UTC
Created attachment 1327 [details]
Initial fma patch for PPC32 soft-fp for discussion
Comment 3 Steven Munroe 2006-09-26 21:38:52 UTC
These patches create fmasf4.c and fmadf4.c in soft-fp. These function have to
built with soft-fp and exported from libc. Otherwise they can not access the
simulated FPU events, masks and rounding modes which are defined in libc but not
exports.

The soft-fp implementation of s_fma.c and s_fmaf.c reside in libm and are
exported from there. Currently there does not seem to be a good way to override
the generic (and incorrect) ./math/s_fms[f].c from the soft-fp directory. So for
now I placed the overrides in ports/sysdeps/powerpc/nofpu for testing.

I have not implemented a s_fmal.c/fmatf4.c for now. This would imply a soft-fp
(256-bit) extented implementation not supported by the current soft-fp. 
Comment 4 Steven Munroe 2006-10-04 18:27:19 UTC
Corrected title per Joseph Myer comment
Comment 5 Steven Munroe 2006-10-05 19:35:04 UTC
Created attachment 1348 [details]
updated Ports patch for powerpc32 soft-fp FMA

Updated patch that resolves search patch  build problem.
Comment 6 Steven Munroe 2006-10-05 19:36:22 UTC
Created attachment 1349 [details]
Updated FMA soft-fp implementation

Cleaned up source and updated comments
Comment 7 Steven Munroe 2006-10-05 19:47:29 UTC
Created attachment 1350 [details]
New fma_test to verify the intermediate multiply product has full percision.

Add a new fma_test that verifies that:

"The operation fma(x,y,z) computes (x ×y )+z as if with unbounded range and
precision, rounding only once to the destination format"

This implies that for double that the 106 bit result of the multiply be passed
unrounded to the add (53 bit) and then final result found to 53 bits.
Comment 8 Steven Munroe 2007-01-22 23:54:30 UTC
Created attachment 1511 [details]
Updated patch to ports to add fma for PPC32 soft-fp
Comment 9 Steven Munroe 2007-01-23 00:01:50 UTC
Created attachment 1512 [details]
Updated patch to add single and double fma to soft-fp

This adds macros to double.h and quad.h to copy internal soft-fp values between
the RAW, SEMIRAW, and CANNONICAL formats without using any float types.
Specifically this patch allows fmasf4.c and fmadf4.c to be implemented without
requiring TF type support.
Comment 10 Steven Munroe 2007-01-23 00:03:37 UTC
Created attachment 1513 [details]
Updated fma_test to verify that the intermediate multiply product has full percision
Comment 11 Steven Munroe 2007-01-23 00:07:00 UTC
Created attachment 1514 [details]
Updated patch to ports to add fma for PPC32 soft-fp
Comment 12 Steven Munroe 2007-01-23 00:14:03 UTC
Created attachment 1515 [details]
Updated fma_test to verify that the intermediate multiply product has full percision

This one includes the change log entry.
Comment 13 Jakub Jelinek 2007-02-05 20:50:04 UTC
Why do you need a soft-fp fmaf other than generic
float fmaf (float x, float y, float z)
{
  return ((double) x * y) + z;
}
?

As DFmode has more than twice as wide mantissa as SFmode and bigger exponent
range as well, (double) x * (double) y will be IMHO always precise, so no
rounding will ever happen there.
Similarly for fma and IEEE quad long double.
Only fmal needs soft-fp or gmp implementation, on all architectures that don't
have it in hardware, and fma for IEEE extended long double or IBM 2x double long
double.
Comment 14 Steven Munroe 2007-02-06 16:30:03 UTC
Yes what you suggest should work for the generic s_fmaf.c, but that is not the
current implmentation. Also fmasf4.c should be a little faster as it avoids some
of the more expensive intermediate conversions.

s_fma.c would be more problematic because it would require full IEEE quad and TF
type support. I believe that you youself pointed out that not all platforms
support IEEE quad/TF type, but those platforms still need fma() to work correctly. 

That is why I reworked the patch to avoid internal use of TF typs and support
direct RAW/SEMIRAW/CANNONICAL conversion for the intermediate steps of the
soft-fp fma. Again fmadf4.c should be faster then the generic.

Yes in the long run we will need to address soft-fp fmal but that requires the
invention of "oct or long long double" format. Most platforms that need soft-fp
don't implement long double 128-bit so we stage that function. 

But we have a fix for soft-fp float and double now. I don't see any reason to delay.
Comment 15 Steven Munroe 2007-02-06 16:31:23 UTC
No one else is working on this so, I'll assign it to my self.
Comment 16 Steven Munroe 2007-02-06 16:32:47 UTC
I think this bug is fixed. 

If you want I can open a separate bug for soft-fp fmal support.

Anyone disagree?
Comment 17 Andreas Jaeger 2007-02-18 13:51:15 UTC
following comment #15.
Comment 18 Steven Munroe 2007-04-03 18:58:38 UTC
Created attachment 1664 [details]
Updated ports soft-fp fma patch for PPC32

Verified with CVS from April 3rd.
Comment 19 jsm-csl@polyomino.org.uk 2007-04-03 18:58:53 UTC
Subject: Re:  fma for all targets without hardware fma instructions is incorrect.

Thank you for your message.

I am away from my email from 29 March to 2 April.  Messages sent
before 29 March will be read before I go away; messages sent during
that period will be read after I return.  If you need a response
before my return, please contact Nathan Sidwell
<nathan@codesourcery.com>.

Comment 20 Steven Munroe 2007-04-03 19:00:29 UTC
Created attachment 1665 [details]
updated patch to add float/double fma to soft-fp

verified with glibc cvs from April 3rd.
Comment 21 Steven Munroe 2007-04-03 20:39:22 UTC
Created attachment 1666 [details]
Updated fma_test for current CVS

Verfied on CVS from April 3rd
Comment 22 Nick Bowler 2010-02-02 21:56:11 UTC
Created attachment 4569 [details]
Example demonstrating fma failure on x86_64

The fma function is still incorrect on x86_64 (with glibc version 2.11) when
the multiplication overflows but the final result would not: the result is an
infinity instead of the rounded true result.

See the attached C program for a demonstration.
Comment 23 Jakub Jelinek 2011-10-17 20:26:20 UTC
glibc 2.14 later should have correct fma{,f,l} implementation, except for targets which don't support exceptions properly and thus round-to-odd trick doesn't work there, and except for PowerPC long double (where it is even unclear what actually should fmal do because of the floating point format weird properties).
Comment 24 Joseph Myers 2012-02-26 17:47:58 UTC
Bug 13304 is the bug with the most recent discussion of generic fma implementations for systems without exceptions / rounding modes, so marking this one as a duplicate; I don't think we need both open.

*** This bug has been marked as a duplicate of bug 13304 ***
Comment 25 Jackie Rosen 2014-02-16 19:42:09 UTC Comment hidden (spam)