This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: sleeping, locks and debug kernels


Oleg, could you take a quick look at the second issue below, I suspect
it is a bug in systemtap's utrace usage, but maybe there is also an
utrace issue (the bug report shows some memory corruption).

On Tue, 2012-01-31 at 23:23 +0100, Mark Wielaard wrote:
> On Mon, Dec 12, 2011 at 05:58:51PM +0100, Oleg Nesterov wrote:
> > Yes, it is very simple to add UTRACE_ATTACH_CREATE_ATOMIC.
> 
> Since newer utrace now have that flag I made a patch to take
> advantage of it. This makes us lockdep free (at least on recent
> rawhide kernels for the systemtap.base make installcheck subset).
> 
> It is on the mjw/create_atomic branch.

I found a couple of small locking issues with this that I also fixed.
Now I have merged it over to master. With this (and a very recent
kernel, I have tested against 3.3.0-0.rc1.git6.1.fc17.x86_64) we seem to
have a lockdep warning free installcheck testsuite (hurray!) except for
the following two issues:

pfiles.stp does something nasty
http://sourceware.org/bugzilla/show_bug.cgi?id=13641
This example contains a ton of native guru code. It holds the rcu lock
while calling something that allocates and so can sleep, which is a no,
no. I haven't yet found what it is exactly.

More seriously though is the following:
Bad interaction between itrace and stap_stop_task_finder
http://sourceware.org/bugzilla/show_bug.cgi?id=13639
This doesn't seem like it is a new issue (Frank said he saw it before).
But it is much more observable now with all other lockdep issues gone.
There is some race between the itrace utrace engine/task removal and the
stap_stop_task_finder utrace engine/task removal. stap_stop_task_finder
calls stap_utrace_detach_ops which just goes over all tasks and tries to
remove the utrace engine from them, ignoring any errors. But during this
it looks like utrace_barrier sees an -ESRCH, does a
schedule_timeout_interruptible, but we are holding the rcu_read_lock, so
that is also a no, no... Maybe this can be "fixed" (aka papered over)
inside utrace by changing the error path to not do the interruptible
bit, but I suspect we are having some kind utrace task detaching race
between itrace and task_finder.

Cheers,

Mark


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]