10854 – Race between script startup and abnormal shutdown

Bug 10854 - Race between script startup and abnormal shutdown

Summary: Race between script startup and abnormal shutdown

Status:	RESOLVED FIXED

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	runtime (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Josh Stone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2009-10-27 18:15 UTC by Josh Stone
Modified:	2009-10-27 21:00 UTC (History)
CC List:	2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:

Attachments
Use a mutex around transport startup/shutdown (785 bytes, patch) 2009-10-27 18:17 UTC, Josh Stone	Details \| Diff
Revised patch with RHEL4 compatibility (971 bytes, patch) 2009-10-27 19:08 UTC, Josh Stone	Details \| Diff
Show Obsolete (1) View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Josh Stone 2009-10-27 18:15:04 UTC

We have a race condition where the shutdown code may be called while the startup
is still in progress.  Since the shutdown thinks the probes aren't started, we
end up leaving some kernel callbacks registered after the module has unloaded,
which will crash the system.

See also https://bugzilla.redhat.com/show_bug.cgi?id=521610

Using para-callgraph2.stp from that bug, the problem can be reproduced with this
loop:

MOD=$(stap -v para-callgraph2.stp sys_read '*@fs/*.c' -p4) && \
while true; do
    staprun $MOD -v -o /dev/null &
    while ! pkill -INT stapio; do true; done
    wait
done

That will repeatedly run the script in the background and kill it as soon as
possible.  I find that it usually triggers the race in only a few iterations.

Comment 1 Josh Stone 2009-10-27 18:17:33 UTC

Created attachment 4323 [details]
Use a mutex around transport startup/shutdown

This patch has removed the race condition, as far as I can tell.  I've been
running that reproducer for a while now with no issues.

Comment 2 Josh Stone 2009-10-27 18:18:46 UTC

Just waiting for confirmation from others that the fix works, then I'll commit
and close this bug.

Comment 3 Josh Stone 2009-10-27 19:08:07 UTC

Created attachment 4327 [details]
Revised patch with RHEL4 compatibility

RHEL4 needs an explicit #include <linux/mutex.h>, and it also needs to revise
the #ifdef DEFINE_MUTEX around accessing inode fields.	I turned the latter
into a kernel version check instead.

Comment 4 Josh Stone 2009-10-27 21:00:35 UTC

a1995fef PR10854: Use a mutex around transport startup/shutdown
d117a23e PR10854 cont'd: Add a testcase for the reproducer