This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Instrumenting context-switching

Perry Cheng wrote:
Instrumenting the context switch code has long been delicate. I can't seem to find any help on this particular topic on the wiki (specifically, the examples).

In the past, following some hints from old docs and mailing list, it has been enough to instrument the __switch_to method to get the prev and next tasks. The method one level higher is switch_to which, being a macro, is not instrumentable. Lately, I've switched from an older i386 kernel to a new x86_64 kernel and now __switch_to no longer is instrumentable with kprobes. The higher-level context_switch itself does not seem probe-able because it is a static inline method. Even higherup, we have the call to context_switch from schedule but instrumenting that would require using a specific line number which seems rather fragile because of greater reliance of debugging code and susceptibility to kernel code change.

The scheduler tapset probes context_switch() on x86_64, but that doesn't help much. context_switch() is an inline and, thus, the entry parameters prev and next aren't accessible via SystemTap.

What kernel version are you using? If your using 2.6.24-rc1 then kernel markers are available. You can place a static marker (as shown in the patch below), rebuild and reboot the kernel, then access the probe point via a SystemTap script (as shown in the script below the patch). You'll need to use the latest SystemTap snapshot to get marker probe support:

Of course, this won't help if you're using earlier versions of the kernel or SystemTap.

- Mike

diff --git a/kernel/sched.c b/kernel/sched.c
index 3f6bd11..3281098 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -1937,6 +1937,8 @@ context_switch(struct rq *rq, struct task_struct *prev,
       spin_release(&rq->lock.dep_map, 1, _THIS_IP_);

+       trace_mark(sched_switch_to, "%p %p %p", rq, prev, next);
       /* Here we just switch the register state and the stack. */
       switch_to(prev, next, prev);

probe kernel.mark("sched_switch_to")
       rq = $arg1
       p = $arg2
       n = $arg3

       next_pid = task_pid(n)
       prev_pid = task_pid(p)
       next_name = task_execname(n)
       prev_name = task_execname(p)

       printf("%s (%d) switching to %s (%d)\n",
              prev_name, prev_pid, next_name, next_pid)

So, on an x86_64 kernel, how do I instrument this method?


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]