This is the mail archive of the
mailing list for the systemtap project.
Re: [Patch] Fix bug of deadlock for staprun
- From: Martin Hunt <hunt at redhat dot com>
- To: Lai Jiangshan <laijs at cn dot fujitsu dot com>
- Cc: systemtap at sourceware dot org
- Date: Mon, 15 Oct 2007 13:52:18 -0400
- Subject: Re: [Patch] Fix bug of deadlock for staprun
- Organization: Red Hat Inc.
- References: <4712FEF8.email@example.com>
On Mon, 2007-10-15 at 14:47 +0900, Lai Jiangshan wrote:
> First bug: handle_symbols() opened control_channel
> and exited without closing this fd when errors occur. So if
> any error occurred in handle_symbols(), it will result in
> The most easy way to reproduce this bug:
> $ ulimit -HSn 4
> # make staprun open control_channel successfully,
> # but fail to open one more.
> $ staprun -L stap_XXXXXX.ko
> # deadlock
> Just add a line "close_ctl_channel();" in cleanup()
> can fix this bug.
I never figured out how to create this, but I did notice the possibility
that it could happen and fixed it last week as part of a rewrite of the
control channel code.
> Second bug: staprun runs as privileged process, but
> it trusts an unprivileged process stapio's return value. And
> the owner of files belong to stap_XXXXXX.ko is the user who
> runs staprun.
> So evil_usr(he is just a stapuser) can cause a
> deadlock in this way:
> step command
> 1 $ staprun -L stap_XXXXXX.ko
> 2 $ exec 3<>/sys/kernel/debug/systemtap/stap_XXXXXX/cmd
> # open one of the files belong to stap_XXXXXX.ko,
> # fd 3 will be inherited by staprun, so staprun
> # holds this file.
> 3 $ staprun -A stap_XXXXXX.ko &
> 4 # (very quickly) evil_usr uses syscall ptrace to modify
> # argument of syscall exit in stapio, make stapio exit
> # with exit_code=3.
> Thus staprun will get stapio's exit_code=3, so it will
> delete stap_XXXXXX.ko which lead to a deadlock.
> In this condition, staprun gets too little information
> to prevent evil_usr from doing such things. And staprun has no
> enough information to return back all resource belong to
> stap_XXXXXX.ko before deleting it. The best way for staprun is
> that staprun cannot "Uninterruptible sleep", just left root
> cleanup the mess.
> So I just added a flag O_NONBLOCK when calling
> delete_module. I think this flag should be added even if the
> bug does not exist.
I guess I don't see the problem here. The user, with a bit of
creativity, can cause staprun to not exit until some resource he
has managed to grab is freed. I did this intentionally because I wanted
it to be obvious when this was happening. What harm does this cause and
why would it be better to have modules silently hanging around pending
deletion? Wouldn't this cause more confusion when the user tries to run
the same module later and it won't load because it still exists?