Created attachment 7816 [details] crash_testcase.exp Running the following simple script on a busy system (where many processes are created/destroyed quickly) eventually causes the system to lock up. It takes a while sometimes to occur (e.g. 1-2 hours), but it always does. I haven't been able so far to determine the cause of the issue, although the backtraces might implicate utrace.
Created attachment 7817 [details] dmesg.log
Forgot to add, this happened on f20 3.16.2-200 on git stap at least as of commit 3525152, but also earlier (including prior to the rt patches). Will try to do a bisect.
I'd certainly suspect utrace, especially since I see utrace_free() in your dmesg output. However, I also see _raw_spin_lock, and that's got me confused. We added some patches recently to add support for realtime kernels, but we shouldn't be using raw spinlocks anywhere but realtime kernels. The only real utrace change lately was the following: ==== commit d9d07e99777c6e7aaaa8db0049c5fd5e5a2f01b0 Author: David Smith <dsmith@redhat.com> Date: Fri Jul 18 15:49:39 2014 -0500 Fixed PR17181 by making utrace handle interrupting processes better. ====
Created attachment 8312 [details] dmesg.log This is still an issue on the latest f20 3.19.5 with the latest git stap. Interestingly, adding debug statements in utrace_free() confirms that the crash does not happen there, but the rest of the stack is still very similar (showing a backtrace coming from exit() related calls).
running this test on a rawhide (5.13-rc0 kernel, 4.5-rc stap), it's solid.