This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: 20090521 systemtap meeting notes
- From: Jim Keniston <jkenisto at us dot ibm dot com>
- To: David Smith <dsmith at redhat dot com>
- Cc: ananth at in dot ibm dot com, RedHat_perftools <external-perftools-list at redhat dot com>, William Cohen <wcohen at redhat dot com>, Roland McGrath <roland at redhat dot com>, systemtap <systemtap at sources dot redhat dot com>
- Date: Tue, 26 May 2009 10:02:11 -0700
- Subject: Re: 20090521 systemtap meeting notes
- References: <4A15A354.4050000@redhat.com> <20090522094036.GD5562@in.ibm.com> <20090523015651.D7354FC35D@magilla.sf.frob.com> <20090525104509.GA19797@in.ibm.com> <4A1BE332.8070302@redhat.com>
On Tue, 2009-05-26 at 07:40 -0500, David Smith wrote:
> Ananth N Mavinakayanahalli wrote:
> > On Fri, May 22, 2009 at 06:56:51PM -0700, Roland McGrath wrote:
> >> (Why is this not on systemtap@?)
> >
> > (This was a response to the Thursday MoM Will posted to perftools.
> > Should've been on systemtap@)
> >
> >>> stapio/2796 is trying to acquire lock:
> >>> (&mm->mmap_sem){++++++}, at: [<e181beab>] register_uprobe+0x24d/0x82a [uprobes]
> >>>
> >>> but task is already holding lock:
> >>> (&mm->mmap_sem){++++++}, at: [<e18bdfd6>] __stp_utrace_task_finder_target_quiesce+0x211/0x2db [stap_722fa39772a3d7da10b7105c514a76be_1462]
> >> task_finder calls ->mmap_callback with mmap_sem held for reading. But it
> >> can lead into register_uprobe, which can try to take it for either reading
> >> or writing. The lockdep complaint about taking it again for reading could
> >> be avoided by using down_read_nested. But the real problem is when
> >> register_uprobe gets into uprobe_setup_ssol_vma and tries to take it for
> >> writing.
uprobe_setup_ssol_vma() is not called from register_uprobe(), but rather
from uprobe_report_signal() the first time a breakpoint is hit.
> >>
> >> I think the task-finder callback plan just has to get more sophisticated.
> >> Callbacks with a lock like mmap_sem held is kind of dubious for any
> >> quasi-generic API, because of just this kind of complexity.
> >
> > Maybe that's something for David Smith to take a first stab at. David?
>
> Hmm. Looking back through the task_finder code, I believe the mmap_sem
> is being held so that the vma list doesn't get deleted from underneath
> the task_finder. However, I'm not sure that can really happen in the
> cases where it is done. It might be possible that calling
> 'get_task_mm()' would be enough here.
>
> It looks like the task_finder runs callbacks with mmap_sem held in 2 places:
>
> 1) When initially attaching to a "interesting" thread, it gets stopped.
> In the quiesce handler, the mmap callbacks are run for vma's that
> existed before task_finder attached to it. (This is only done for the
> thread group leader.) The entire vma list is processed in this matter.
>
> Since the thread is stopped, how worried should the task_finder be that
> another thread in the same thread group might modify mm->map?
If you're introducing (say) 1000 probes into an existing multithreaded
app, probe #1 could get hit by one thread (thus triggering
uprobe_setup_ssol_vma()) while later probes are still being registered.
I don't see how that causes a deadlock, though. Seems like
uprobe_setup_ssol_vma() would just block (NOT holding the
uprobe_process->rwsem, BTW) until task_finder released mmap_sem.
>
> 2) At syscall exit, if the call is mmap or mmap2, the callbacks are
> called on the new vma. In this case it would be possible to hold
> mmap_sem, get the information needed out of the new vma, release
> mmap_sem, then call the callbacks.
>
Jim