When running the testsuite in parallel mode, I see several of the following messages on the console: [ 5619.908020] rm[61446]: unhandled signal 11 at 00003fffdc83ff90 nip 00003fffa8a51f38 lr 00003fffa8a34008 code 30001 [ 6611.824811] bz5274[54850]: unhandled signal 11 at 0000000000000000 nip 0000000000000000 lr 0000000000000000 code 30001 On RHEL7 ppc64, I see around 7 of these during a full testsuite run. I don't see these errors when the testsuite is run in non-parallel mode. I'm not 100% sure it is related, but the testsuite also got hung at the end of this run, waiting on a 'loop' process (from unprivileged_probes.exp) to be killed.
For reference sake, here are the 7 messages: [ 1206.152228] times[1039]: unhandled signal 11 at ffffffffffffffff nip 00003fffa18147b4 lr 0000000010000700 code 30001 [ 1568.066789] times[18596]: unhandled signal 11 at ffffffff nip 0fd970d8 lr 1000050c code 30001 [ 2194.143278] times[9533]: unhandled signal 11 at ffffffffffffffff nip 00003fff8d1347b4 lr 0000000010000700 code 30001 [ 2409.046690] times[30608]: unhandled signal 11 at ffffffff nip 0fd970d8 lr 1000050c code 30001 [ 5619.908020] rm[61446]: unhandled signal 11 at 00003fffdc83ff90 nip 00003fffa8a51f38 lr 00003fffa8a34008 code 30001 [ 6611.824811] bz5274[54850]: unhandled signal 11 at 0000000000000000 nip 0000000000000000 lr 0000000000000000 code 30001 [ 7880.699568] stap[21140]: unhandled signal 11 at 0000000000000000 nip 0000000000000000 lr 00003fff7b2cd188 code 30001 So, this happened to several different exes: times, rm, bz5274, and stap. I'm also unsure of why the kernel reported this on the console, I don't believe it normally does that.
FWIW, signal 11 is SIGSEGV. I do seem to recall that the rlimit test triggers these on purpose. I would not expect that to affect others tests running in parallel though.
(In reply to Josh Stone from comment #2) > FWIW, signal 11 is SIGSEGV. Sigh. That's what I get for relying on my memory. > I do seem to recall that the rlimit test triggers these on purpose. I would > not expect that to affect others tests running in parallel though. When I run the rlimit.exp test by itself, I don't see any of those messages on the console. That test will involve stap calling getrlimit/setrlimit, but that should only affect that particular stap pid (and its descendants), not other previous or future stap processes. There is another test, bad-code.exp, that purposely sends a SIGSEGV, but when that test is run I don't see the message on the console. I need to poke around in the kernel and see why it prints these messages - why these SIGSGVs are different than regular SIGSEGVs.
This exact "unhandled signal" message appears to be powerpc only, from _exception() in arch/powerpc/kernel/traps.c. It looks like there are a few calls in there with SIGSEGV.