This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] stap/staprun do not terminate properly
- From: David Smith <dsmith at redhat dot com>
- To: Torsten Polle <Torsten dot Polle at gmx dot de>, systemtap at sourceware dot org
- Date: Fri, 07 Mar 2014 15:09:21 -0600
- Subject: Re: [PATCH] stap/staprun do not terminate properly
- Authentication-results: sourceware.org; auth=none
- References: <m2k3c7uiaa dot fsf at gmx dot de>
On 03/06/2014 03:30 PM, Torsten Polle wrote:
> Hi,
>
> I'm using the uprobes-inode with task_finder2.c and had two problems,
> when I wanted to terminate my probe runs.
>
> I tested the patches with uprobes-inode and the utrace based version.
>
> Kind Regards,
> Torsten
Torsten,
Thanks *so* much for the patches. I've seen a hang in stap around this
area, but I could never reproduce it.
I checked the 1st patch in as commit e695d46 and the 2nd patch (tweaked)
in as commit 9ee1bfe.
I tweaked the 2nd patch just a bit. Originally the flow went like:
====
stap_stap_task_finder()
{
// ...
// Note that utrace_exit() calls stp_task_work_exit()
utrace_exit();
__stp_tf_cancel_task_work();
}
====
Your patch changed it to this:
====
stap_stap_task_finder()
{
// ...
utrace_exit();
// Note that __stp_tf_cancel_task_work() calls
// stp_task_work_exit()
__stp_tf_cancel_task_work();
}
====
I saw what you were doing, but that didn't "feel" quite right.
utrace_init() calls stp_task_work_init(), so it made sense for
utrace_exit() to call stp_task_work_exit().
So, instead I did this:
====
stap_stap_task_finder()
{
// ...
__stp_tf_cancel_task_work();
// Note that utrace_exit() calls stp_task_work_exit()
utrace_exit();
}
====
This moves canceling all outstanding task_work items before shutting
down utrace (and calling stp_task_work_exit()). I think the end result
is the same as your patch, and I think this makes a little more sense.
This way we've canceled all the task_work items before shutting down
utrace (and freeing all the memory allocated for utrace).
If this doesn't work for you or you see a hole in this logic please let
me know.
BTW, if you have a good idea for a reproducer for the original problem
I'd like to see it. Perhaps I could add a test case for it.
Thanks again for the patches!
--
David Smith
dsmith@redhat.com
Red Hat
http://www.redhat.com
256.217.0141 (direct)
256.837.0057 (fax)