This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: task_finder2 help


On 08/23/2011 09:23 AM, David Smith wrote:
> I've been working on replacing the task_finder's use of utrace.  You can
> see this work on the dsmith/task_finder2 branch.  How does this work?
> I'm using 4 tracepoints and one ftrace function hook to monitor task
> fork/exit/exec and syscall entry & syscall exit.
> 
> On F15 (kernel 2.6.40-4.fc15.x86_64.debug) what's checked in works
> great, all utrace tests pass.  Note that the code isn't finished, there
> is a known memory leak for example, but it seems to work.
> 
> However, on rawhide (3.1.0-0.rc1.git6.1.fc17.x86_64), the kernel
> complains mightily about sleeping functions called from invalid contexts.

This is probably just because rawhide has more CONFIG_DEBUG flags on.

> There are a couple of classes of these bugs (after working through some
> semi-minor issues):
> 
> - Use of get_task_mm()/mmput().  There are several places in
> runtime/task_finder2.c where we call get_task_mm() to get a task's mm,
> we do something with it (like get the task's pathname), then call
> mmput() to release the mm.  (A couple of places we just check to see if
> the task has a mm, in that case calling get_task_mm()/mmput() is
> probably overkill.)
> 
> Looking at the source to get_task_mm() (in kernel/fork.c), you'll see
> that it locks the task, increments an atomic variable, then unlocks the
> task.  The problem is in mmput().  mmput() calls might_sleep()
> unconditionally.  Why?  Because when the atomic reaches 0, all the mm's
> data is freed.
> 
> So, how to work around this?  For now I've been just locking the task,
> hoping that nothing will change the mm.  If the task is a single
> threaded task, that assumption might be reasonable.  The task is stopped
> anyway at the various points I probe (fork/exec/exit/syscalls).
> However, if the task is part of a multi-threaded program, that
> assumption probably isn't true.  Yes, this task is stopped, but another
> running task in the same group could possibly change the mm.

The get/put are only protecting from having the mm released.  So long as
you're always doing this within the context of a task that owns this mm,
then it seems safe to say that mm_users will always be >0 -- and so the
get/put are really redundant.  You can see many examples in mmap.c  that
don't bother with get/put on current->mm.

I don't think any of this prevents a *change* in the mm though.  If
that's a concern, I believe you need to down_read mmap_sem.

> - Memory allocation.  All memory allocation in the task_finder is done
> using GFP_KERNEL (with perhaps some additional other flags).  Most of
> the allocation is done to store PATH_MAX buffers, to be used when we're
> looking at the pathname of a task.  Even as far back as the 2.6.9
> kernel, GFP_KERNEL has been defined as (__GFP_WAIT | __GFP_IO |
> __GFP_FS).  If __GFP_WAIT is set in the flags passed to kmalloc(),
> kmalloc() calls might_sleep().
> 
> So, the obvious solution is to change the flags we pass from GFP_KERNEL
> to (GFP_KERNEL & ~__GFP_WAIT).  That seems to avoid the BUG, but I'm
> unsure of the ramifications (other than the obvious one that more memory
> allocations could fail).  It could be that we should have been turning
> off __GFP_WAIT even before now.
> 
> Does anyone know of pitfalls of turning off __GFP_WAIT?

The obvious pitfall is that the allocation is more likely to fail, so
you have to be ready to deal with that.

I'm not sure what the full ramifications are for keeping __GFP_IO and
__GFP_IO, but I think GFP_ATOMIC is for exactly these unsleepable
allocations.  Or possibly GFP_NOWAIT to avoid the emergency pool.

If possible, the best choice is to allocate from a safer context, but I
assume you've already ruled that out...

Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]