Summary: | utrace: taskfinder misses events when main thread does not go through at least one quiesce | ||
---|---|---|---|
Product: | systemtap | Reporter: | Rayson Ho <raysonlogin> |
Component: | runtime | Assignee: | Unassigned <systemtap> |
Status: | RESOLVED FIXED | ||
Severity: | normal | CC: | dsmith, fche, jistone, kahing, mark |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Rayson Ho
2011-04-06 06:05:26 UTC
David, could this be corrected by a UTRACE_STOP sent to the main thread upon attaching to it? Just adding a comment so people who see hotspot java probes not working against running java processes might find this bug report. The hotspot main thread does nothing except wait for all other threads to exit. Which can trigger this bug. This happens for example on a 2.6.32-220.23.1.el6.x86_64 kernel. Depending on how you count, we've got 3 or 4 sets of utrace-like functionality here, with different behaviors: 1) the original version of utrace, which is present in RHEL5, handled by runtime/linux/task_finder.c. In this case we do send the main thread a UTRACE_STOP, which causes it to stop and we attach correctly. Here your testcase passes. 2) Version 2 of utrace, which is present in RHEL6 (also handled by runtime/linux/task_finder.c). In this case we do send the main thread a UTRACE_STOP - however the stop doesn't/can't interrupt a system call. Here your testcase fails. 3) On new kernels without "real" utrace, we fake it with tracepoint handlers and task_work_add(). This code is in runtime/linux/task_finder2.c and runtime/stp_utrace.c. This code uses task_work_add() to run code when the task is stopped. This can't interrupt a system call. Here your testcase fails. 4) The new dyninst runtime ("--runtime=dyninst") uses the dyninst library to attach, which ends up using ptrace. Attaching to a running process isn't quite there yet, but I think it should be possible to interrupt a system call. So, for cases 2) and 3) above, we've still got some thinking to do. (In reply to comment #3) > 4) The new dyninst runtime ("--runtime=dyninst") uses the dyninst library to > attach, which ends up using ptrace. Attaching to a running process isn't quite > there yet, but I think it should be possible to interrupt a system call. I'd be surprised if there was a quiesce-type issue in stapdyn, but I did find an obvious omission that caused its attach to miss -- commit 02bff02. Commit f346b8b fixes this for everything except dyninst. Here's what was going on with this one. In the original utrace (RHEL5), UTRACE_STOP could interrupt the target task (by sending the task a fake signal). This caused the target task to stop even when it was sleeping. In the newer utrace (RHEL6), that functionality was split out into UTRACE_INTERRUPT. So, for RHEL6, we now make a pass after the systemtap session is started and send all target tasks an UTRACE_INTERRUPT. For newer kernels without utrace, I've implemented UTRACE_INTERRUPT support, so just like for RHEL6, we now make a pass after the systemtap session is started and send all target tasks an UTRACE_INTERRUPT. I've added a test case, called 'main_quiesce.exp' that tests this issue. This new test case passes everywhere, except with dyninst. That will need more investigation. The dyninst problem has been moved to its own bug, #14923. |