Bug 21035 - longjmp() resets FPU exception mask
Summary: longjmp() resets FPU exception mask
Status: RESOLVED INVALID
Alias: None
Product: glibc
Classification: Unclassified
Component: math (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-01-10 00:32 UTC by Nadav Har'El
Modified: 2017-01-17 17:00 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Code demonstrating this problem (C++) (486 bytes, text/x-c)
2017-01-10 00:32 UTC, Nadav Har'El
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Nadav Har'El 2017-01-10 00:32:22 UTC
Created attachment 9745 [details]
Code demonstrating this problem (C++)

I wrote a small test for feenablexcept() which enables an FPU exception (e.g., FE_DIVBYZERO), makes it happen (dividing 1.0 by 0.0), catches the exception, and longjmp()s to return from the test. See code below.

However, it turns out that longjmp(), very unexpectedly, resets the exception mask back to zero! So after an feenableexcept() we'll get one exception and longjmp, but a second division by zero will no longer generate a signal.

Code demonstrating this problem attached.
Comment 1 jsm-csl@polyomino.org.uk 2017-01-10 01:03:50 UTC
What architecture and glibc version are you using?
Comment 2 Nadav Har'El 2017-01-12 10:10:46 UTC
x86_64, libc 2.24 (on Fedora 25).

Maybe this is related to the similar issue #12420 ?
Comment 3 jsm-csl@polyomino.org.uk 2017-01-12 14:13:07 UTC
Note that it's only glibc's job to ensure that longjmp doesn't change the 
mask - that is, that when setjmp returns for the second time, the 
floating-point environment is the same as it was on entry to longjmp.  If 
it's the kernel or processor changing the mask on entry to a signal 
handler, that's not glibc's problem.
Comment 4 Nadav Har'El 2017-01-12 14:54:25 UTC
You are correct. I missed that at the time the signal handler is called, the mask is already reset, and I wrongly suspected the longjmp was doing that. Sorry about that.
Comment 5 Nadav Har'El 2017-01-12 15:13:26 UTC
Hmm, on second thought, I think this actually *is* a sigsetjmp()/siglongjmp() problem:

If I understand correctly, when the kernel starts a signal handler, it saves the FPU state (including the control word) and then sets some "sane default" so the handler can run without having to guess what kind of messed up state the application was in when the signal occurred. I'm guessing this "sane default" probably has all the FPU exceptions masked, because this is what FNINIT does by default.

If this is what happens, then sigsetjmp/siglongjmp needs to save the FPU state - including the control word - as they were during the time of sigsetjmp. If they don't, instead of restoring the correct fpu control word, we'll be left with the "sane default" that the signal handler left us with.

Apparently this issue has been known for many years: see for example:
https://bugs.openjdk.java.net/browse/JDK-6292965
https://lists.freebsd.org/pipermail/svn-src-head/2015-March/069337.html
Comment 6 Nadav Har'El 2017-01-12 15:17:07 UTC
Oops, the second link is actually about freebsd, not glibc... But the first one is about glibc, from 12 years ago.
Comment 7 jsm-csl@polyomino.org.uk 2017-01-12 15:57:14 UTC
The floating-point state is logically like a thread-local variable.  
Indeed, on some platforms some parts of it *are* TLS variables.

It would be wrong for siglongjmp to restore the value of a thread-local 
variable to what it was when sigsetjmp was called - the user program might 
have changed it after calling sigsetjmp, and that change must remain in 
effect.  Likewise, it would be wrong for siglongjmp to restore the 
floating-point environment to what it was when sigsetjmp was called - 
changes made by the user program must remain in effect.  You want the 
state as it was before the kernel called the signal handler, but 
siglongjmp doesn't have access to that information.

Note that POSIX explicitly says about longjmp (and the siglongjmp 
differences don't matter): "All accessible objects have values, and all 
other components of the abstract machine have state (for example, 
floating-point status flags and open files), as of the time longjmp() was 
called, except that the values of objects of automatic storage duration 
are unspecified if they meet all the following conditions".
Comment 8 Joseph Myers 2017-01-17 17:00:33 UTC
As discussed, the longjmp semantics require preserving the floating-point environment when longjmp is called, *not* restoring the environment from when setjmp was called.