We have a race condition where the shutdown code may be called while the startup is still in progress. Since the shutdown thinks the probes aren't started, we end up leaving some kernel callbacks registered after the module has unloaded, which will crash the system. See also https://bugzilla.redhat.com/show_bug.cgi?id=521610 Using para-callgraph2.stp from that bug, the problem can be reproduced with this loop: MOD=$(stap -v para-callgraph2.stp sys_read '*@fs/*.c' -p4) && \ while true; do staprun $MOD -v -o /dev/null & while ! pkill -INT stapio; do true; done wait done That will repeatedly run the script in the background and kill it as soon as possible. I find that it usually triggers the race in only a few iterations.
Created attachment 4323 [details] Use a mutex around transport startup/shutdown This patch has removed the race condition, as far as I can tell. I've been running that reproducer for a while now with no issues.
Just waiting for confirmation from others that the fix works, then I'll commit and close this bug.
Created attachment 4327 [details] Revised patch with RHEL4 compatibility RHEL4 needs an explicit #include <linux/mutex.h>, and it also needs to revise the #ifdef DEFINE_MUTEX around accessing inode fields. I turned the latter into a kernel version check instead.
a1995fef PR10854: Use a mutex around transport startup/shutdown d117a23e PR10854 cont'd: Add a testcase for the reproducer