This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH] stap/staprun do not terminate properly
- From: Torsten Polle <Torsten dot Polle at gmx dot de>
- To: David Smith <dsmith at redhat dot com>
- Cc: systemtap at sourceware dot org
- Date: Fri, 7 Mar 2014 23:11:12 +0100
- Subject: Re: [PATCH] stap/staprun do not terminate properly
- Authentication-results: sourceware.org; auth=none
- References: <m2k3c7uiaa dot fsf at gmx dot de> <531A3581 dot 7050009 at redhat dot com>
David Smith writes:
> On 03/06/2014 03:30 PM, Torsten Polle wrote:
>> Hi,
>>
>> I'm using the uprobes-inode with task_finder2.c and had two problems,
>> when I wanted to terminate my probe runs.
>>
>> I tested the patches with uprobes-inode and the utrace based version.
>>
>> Kind Regards,
>> Torsten
> Torsten,
> Thanks *so* much for the patches. I've seen a hang in stap around
> this area, but I could never reproduce it.
David,
I could easily reproduce the problem for half a year now 100%, but I
never got the time to find the root cause.
> I checked the 1st patch in as commit e695d46 and the 2nd patch
> (tweaked) in as commit 9ee1bfe.
> I tweaked the 2nd patch just a bit. Originally the flow went like:
> ====
> stap_stap_task_finder()
> {
> // ...
> // Note that utrace_exit() calls stp_task_work_exit()
> utrace_exit();
> __stp_tf_cancel_task_work();
> }
> ====
> Your patch changed it to this:
> ====
> stap_stap_task_finder()
> {
> // ...
> utrace_exit();
> // Note that __stp_tf_cancel_task_work() calls
> // stp_task_work_exit()
> __stp_tf_cancel_task_work();
> }
> ====
> I saw what you were doing, but that didn't "feel" quite right.
> utrace_init() calls stp_task_work_init(), so it made sense for
> utrace_exit() to call stp_task_work_exit().
> So, instead I did this:
> ====
> stap_stap_task_finder()
> {
> // ...
> __stp_tf_cancel_task_work();
> // Note that utrace_exit() calls stp_task_work_exit()
> utrace_exit();
> }
> ====
> This moves canceling all outstanding task_work items before shutting
> down utrace (and calling stp_task_work_exit()). I think the end
> result is the same as your patch, and I think this makes a little
> more sense. This way we've canceled all the task_work items before
> shutting down utrace (and freeing all the memory allocated for
> utrace).
> If this doesn't work for you or you see a hole in this logic please
> let me know.
I can't beat your logic. It should work for me. Unfortunately, I don't
have direct access to my target for two weeks.
> BTW, if you have a good idea for a reproducer for the original
> problem I'd like to see it. Perhaps I could add a test case for it.
I simply define a process probe and cross compile the module "foo" for
an ARM target. Then I run "staprun -o /tmp/probes.txt foo". After a
while I (try to) terminate the execution by "Ctrl-C".
If there is a process that is never scheduled, the task worker for the
process is never executed. Thus, staprun hangs. Usually, there are a few
processes that exhibit this behaviour on my target.
> Thanks again for the patches!
> --
> David Smith
> dsmith@redhat.com
> Red Hat
> http://www.redhat.com
> 256.217.0141 (direct)
> 256.837.0057 (fax)
Torsten