Created attachment 9745 [details] Code demonstrating this problem (C++) I wrote a small test for feenablexcept() which enables an FPU exception (e.g., FE_DIVBYZERO), makes it happen (dividing 1.0 by 0.0), catches the exception, and longjmp()s to return from the test. See code below. However, it turns out that longjmp(), very unexpectedly, resets the exception mask back to zero! So after an feenableexcept() we'll get one exception and longjmp, but a second division by zero will no longer generate a signal. Code demonstrating this problem attached.
What architecture and glibc version are you using?
x86_64, libc 2.24 (on Fedora 25). Maybe this is related to the similar issue #12420 ?
Note that it's only glibc's job to ensure that longjmp doesn't change the mask - that is, that when setjmp returns for the second time, the floating-point environment is the same as it was on entry to longjmp. If it's the kernel or processor changing the mask on entry to a signal handler, that's not glibc's problem.
You are correct. I missed that at the time the signal handler is called, the mask is already reset, and I wrongly suspected the longjmp was doing that. Sorry about that.
Hmm, on second thought, I think this actually *is* a sigsetjmp()/siglongjmp() problem: If I understand correctly, when the kernel starts a signal handler, it saves the FPU state (including the control word) and then sets some "sane default" so the handler can run without having to guess what kind of messed up state the application was in when the signal occurred. I'm guessing this "sane default" probably has all the FPU exceptions masked, because this is what FNINIT does by default. If this is what happens, then sigsetjmp/siglongjmp needs to save the FPU state - including the control word - as they were during the time of sigsetjmp. If they don't, instead of restoring the correct fpu control word, we'll be left with the "sane default" that the signal handler left us with. Apparently this issue has been known for many years: see for example: https://bugs.openjdk.java.net/browse/JDK-6292965 https://lists.freebsd.org/pipermail/svn-src-head/2015-March/069337.html
Oops, the second link is actually about freebsd, not glibc... But the first one is about glibc, from 12 years ago.
The floating-point state is logically like a thread-local variable. Indeed, on some platforms some parts of it *are* TLS variables. It would be wrong for siglongjmp to restore the value of a thread-local variable to what it was when sigsetjmp was called - the user program might have changed it after calling sigsetjmp, and that change must remain in effect. Likewise, it would be wrong for siglongjmp to restore the floating-point environment to what it was when sigsetjmp was called - changes made by the user program must remain in effect. You want the state as it was before the kernel called the signal handler, but siglongjmp doesn't have access to that information. Note that POSIX explicitly says about longjmp (and the siglongjmp differences don't matter): "All accessible objects have values, and all other components of the abstract machine have state (for example, floating-point status flags and open files), as of the time longjmp() was called, except that the values of objects of automatic storage duration are unspecified if they meet all the following conditions".
As discussed, the longjmp semantics require preserving the floating-point environment when longjmp is called, *not* restoring the environment from when setjmp was called.