This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints


Hi,

I have attached the next version of the patch. This incorporates most of
the comments given by Josh and Frank. This was tested on mainline linux
kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
ppc64.

Frank,
My full name is Kiran Prakash 

Josh,
I was not able to merge the signal and kprocess tapsets with my
scheduler tapset as my scheduler tapset probes the tracepoints mainly
and in case tracepoints are not present, it probes the kernel functions.
So I can define namespaces only for the variables exported by the
tracepoint.

Changelog
---------
 	- Included the kp and tp probes in __scheduler namespace
        - Merged the probes scheduler.migrate and scheduler.migrate_task
	- Modified the sched_switch example script to print the wakeup 
	  info in the wakeup probe itself. Hence not using any
	  variables.

Signed-off-by: Kiran Prakash <kiran@linux.vnet.ibm.com>

diff -Naur systemtap-orig/tapset/scheduler.stp systemtap/tapset/scheduler.stp
--- systemtap-orig/tapset/scheduler.stp	2009-09-19 10:27:14.000000000 -0400
+++ systemtap/tapset/scheduler.stp	2009-09-28 23:32:59.000000000 -0400
@@ -20,7 +20,7 @@
 %}
 
 
-/* probe scheduler.cpu_off
+/** probe scheduler.cpu_off
  *
  *  Fires when a process is about to stop running on a cpu.
  *
@@ -41,7 +41,7 @@
 }
 
 
-/* probe scheduler.cpu_on
+/** probe scheduler.cpu_on
  *
  *  Fires when a process is beginning execution on a cpu.
  *
@@ -76,26 +76,6 @@
     idle = __is_idle()
 }
 
-
-/* probe scheduler.migrate
- *
- *  Fires whenever a task is moved to a different cpu's runqueue.
- *
- * Context:
- *  Unknown (sometimes migration thread, sometimes cpu_to)
- *
- * Arguments:
- *  task - the process that is being migrated
- *  cpu_from - the cpu that is losing the task
- *  cpu_to - the cpu that is claiming the task
- */
-probe scheduler.migrate = kernel.function("pull_task")? {
-    task = $p
-    cpu_from = task_cpu($p) /*thread_info renamed to stack since 2.6.22*/
-    cpu_to = $this_cpu
-}
-
-
 /* probe scheduler.balance
  *
  *  Fires when a cpu attempts to find more work.
@@ -107,43 +87,314 @@
 probe scheduler.balance = kernel.function("idle_balance")? {}
 
 
-/* probe scheduler.ctxswitch 
- * 
- *  Fires when there is a context switch
-
- *  Currently systemTap can't access arguments of inline
- *  functions. So we choose to probe __switch_to instead
- *  of context_switch()       
-
- *  Arguments:
- *    prev_pid: The pid of the process to be switched out
- *    next_pid: The pid of the process to be switched in
- *    prevtsk_state: the state of the process to be switched out
+/** 
+ * probe scheduler.ctxswitch - Fires when there is a context switch. Currently
+ *			       systemTap can't access arguments of inline 
+ *			       functions. So we choose to probe __switch_to instead 
+ *			       of context_switch()       
+ * @prev_pid: The pid of the process to be switched out
+ * @next_pid: The pid of the process to be switched in
+ * @prev_tid: The tid of the process to be switched out
+ * @next_tid: The tid of the process to be switched in	 
+ * @prev_task_name: The name of the process to be switched out
+ * @next_task_name: The name of the process to be switched in
+ * @prev_priority: The priority of the process to be switched out
+ * @next_priority: The priority of the process to be switched in
+ * @prevtsk_state: the state of the process to be switched out
+ * @nexttsk_state: the state of the process to be switched in
  */
-probe scheduler.ctxswitch =
+
+probe __scheduler.ctxswitch.tp = kernel.trace("sched_switch") 
+{
+	next_pid =  $next->pid
+        next_tid = task_tid($next)
+        next_task = $next
+        next_task_name = task_execname($next)
+        nexttsk_state = $next->state
+	next_priority = $next->prio
+	prev_priority = $prev->prio
+	prev_pid = $prev->pid
+        prev_tid = task_tid($prev)
+        prev_task = $prev
+        prev_task_name = task_execname($prev)
+        prevtsk_state = $prev->state
+}
+
+probe __scheduler.ctxswitch.kp =
 %( arch != "x86_64" && arch != "ia64" %?
-	kernel.trace("sched_switch") !, kernel.function("__switch_to")
+        kernel.function("__switch_to")
 %:
-	kernel.trace("sched_switch") !, kernel.function("context_switch")
+        kernel.function("context_switch")
 %)
 {
 %( arch == "powerpc" %?
-        prev_pid = $prev->pid
-        next_pid =  $new->pid
+	prev_pid = $prev->pid
+        next_pid = $new->pid
+        prev_tid = task_tid($prev)
+        next_tid = task_tid($new)
         prev_task = $prev
         next_task = $new
+	next_priority = $new->prio
+        prev_priority = $prev->prio
+        prev_task_name = task_execname($prev)
+        next_task_name = task_execname($new)
         prevtsk_state = $prev->state
+        nexttsk_state = $new->state
+
 %: %( arch == "x86_64" || arch == "ia64" %?
         prev_pid = $prev->pid
-        next_pid =  $next->pid
+        next_pid = $next->pid
+	prev_tid = task_tid($prev)
+        next_tid = task_tid($next)
         prev_task = $prev
         next_task = $next
+	next_priority = $next->prio
+        prev_priority = $prev->prio
+	prev_task_name = task_execname($prev)
+        next_task_name = task_execname($next)
         prevtsk_state = $prev->state
+	nexttsk_state = $next->state
 %:
         prev_pid = $prev_p->pid
         next_pid = $next_p->pid
+	prev_tid = task_tid($prev_p)
+        next_tid = task_tid($next_p)
         prev_task = $prev_p
         next_task = $next_p
+	next_priority = $next_p->prio
+        prev_priority = $prev_p->prio
+	prev_task_name = task_execname($prev_p)
+        next_task_name = task_execname($next_p)
         prevtsk_state = $prev_p->state
+	nexttsk_state = $next_p->state
 %) %)
 }
+
+probe scheduler.ctxswitch
+ = __scheduler.ctxswitch.tp !, __scheduler.ctxswitch.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
+ * @thread_pid: pid of the thread being stopped.
+ * @thread_priority: priority of the thread.
+ */
+probe __scheduler.kthread_stop.kp = kernel.function("kthread_stop")
+{
+	thread_pid = $k->pid
+	thread_priority = $k->priority
+}
+probe __scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop") 
+{
+        thread_pid = $t->pid
+        thread_priority = $t->prio
+}
+probe scheduler.kthread_stop 
+   = __scheduler.kthread_stop.tp !,
+     __scheduler.kthread_stop.kp
+{}
+
+
+/**
+ * probe scheduler.kthread_stop.return - Fires once the kthread is stopped and gets the return value
+ * @return_value: return value after stopping the thread.
+ */
+
+probe __scheduler.kthread_stop.return.kp = kernel.function("kthread_stop").return
+{
+	return_value = $k->exit_code
+}
+probe __scheduler.kthread_stop.return.tp = kernel.trace("sched_kthread_stop_ret")
+{
+        return_value = $ret
+}
+
+probe scheduler.kthread_stop.return 
+ = __scheduler.kthread_stop.return.tp !,
+   __scheduler.kthread_stop.return.kp
+{}
+
+/**
+ * probe scheduler.wait_task - Fires when waiting on a task to unschedule. 
+ *                             It waits till the task becomes inactive.
+ * @task_pid: pid of the task the scheduler is waiting on.
+ * @task_priority: priority of the task
+ */
+
+probe scheduler.wait_task
+ = kernel.trace("sched_wait_task") !,
+   kernel.function("wait_task_inactive")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+}
+
+/**
+ * probe scheduler.wakeup - Fires when a task is woken up 
+ * @task_pid: pid of the task being woken up
+ * @task_priority: priority of the task being woken up
+ * @task_cpu: cpu of the task being woken up
+ * @task_state: state of the task being woken up
+ * @task_tid: tid of the task being woken up
+ */
+
+probe scheduler.wakeup
+ = kernel.trace("sched_wakeup") !,
+   kernel.function("try_to_wake_up")
+{
+	task = $p
+        task_pid = $p->pid
+	task_tid = task_tid($p)
+        task_priority = $p->prio
+	task_cpu = task_cpu($p)
+	task_state = task_state($p)
+}
+
+/**
+ * probe scheduler.wakeup_new - Fires when a newly created task is woken up for the first time
+ * @task_pid: pid of the new task woken up
+ * @task_priority: priority of the new task
+ * @task_tid: tid of the new task woken up
+ * @task_state: state of the task woken up
+ * @task_cpu: cpu of the task woken up
+ */
+probe scheduler.wakeup_new
+ = kernel.trace("sched_wakeup_new") !,
+   kernel.function("wake_up_new_task")
+{
+        task_pid = $p->pid
+        task_priority = $p->prio
+	task_cpu = task_cpu($p)
+        task_state = task_state($p)
+	task = $p
+	task_tid = task_tid($p)
+}
+
+/**
+ * probe scheduler.migrate - Traces the migration of the tasks across cpus by the scheduler.
+ * @task: the process that is being migrated.
+ * @pid: pid of the task being migrated.
+ * @priority: priority of the task being migrated.
+ * @cpu_from: the original cpu
+ * @cpu_to: the destination cpu
+ */
+probe __scheduler.migrate.kp1 = kernel.function("pull_task")
+{
+	cpu_to = $this_cpu
+}
+probe __scheduler.migrate.kp = kernel.function("set_task_cpu")
+{
+	cpu_to = $new_cpu
+}
+probe __scheduler.migrate.tp = kernel.trace("sched_migrate_task")
+{
+	cpu_to = $dest_cpu
+}
+probe scheduler.migrate
+ = __scheduler.migrate.tp !,
+   __scheduler.migrate.kp !,
+   __scheduler.migrate.kp1
+{
+	task = $p
+	pid = $p->pid
+        priority = $p->prio
+        cpu_from = task_cpu($p)
+}
+/**
+ * probe scheduler.process_free - Traces the process of freeing up of a process
+ * @pid: PID of the process getting freed
+ * @priority: priority of the process getting freed
+ */
+probe __scheduler.process_free.kp = kernel.function("delayed_put_task_struct")
+{
+	pid = $tsk->pid
+	priority = $tsk->prio
+}
+probe __scheduler.process_free.tp = kernel.trace("sched_process_free")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+probe scheduler.process_free
+ = __scheduler.process_free.tp !,
+   __scheduler.process_free.kp
+{}
+
+/**
+ * probe scheduler.process_exit - Fires when a process exits
+ * @pid: pid of the process exiting
+ * @priority: priority of the process exiting
+ */
+probe __scheduler.process_exit.kp = kernel.function("do_exit")
+{
+	pid = $tsk->pid
+	priority = $tsk->priority
+}
+probe __scheduler.process_exit.tp = kernel.trace("sched_process_exit")
+{
+        pid = $p->pid
+        priority = $p->prio
+}
+
+probe scheduler.process_exit 
+ = __scheduler.process_exit.tp !,
+   __scheduler.process_exit.kp
+{}
+
+/**
+ * probe scheduler.process_wait - Fires when scheduler waits on a process
+ * @pid: PID of the process scheduler is waiting on
+ */
+probe __scheduler.process_wait.kp = kernel.function("do_wait")
+{
+	pid = $wo->wo_pid
+}
+probe __scheduler.process_wait.tp = kernel.trace("sched_process_wait")
+{
+        pid = $pid
+}
+probe scheduler.process_wait
+ = __scheduler.process_wait.tp !,
+   __scheduler.process_wait.kp
+{}
+
+/**
+ * probe scheduler.process_fork - Probes the tracepoint for forking a process
+ * @parent_pid: PID of the parent process
+ * @child_pid: PID of the child process
+ */
+probe __scheduler.process_fork.kp = kernel.function("do_fork")
+{
+	parent_pid = $current->pid
+	child_pid = $p->pid
+}
+probe __scheduler.process_fork.tp = kernel.trace("sched_process_fork")
+{
+        parent_pid = $parent->pid
+        child_pid = $child->pid
+}
+
+probe scheduler.process_fork
+ = __scheduler.process_fork.tp !,
+   __scheduler.process_fork.kp
+{}
+/**
+ * probe scheduler.signal_send - Probes the tracepoint for sending a signal
+ * @pid: pid of the process sending signal
+ * @signal_number: signal number
+ */
+probe __scheduler.signal_send.kp = kernel.function("__send_signal")
+{
+	pid = $t->pid
+}
+probe __scheduler.signal_send.tp = kernel.trace("sched_signal_send")
+{
+        pid = $p->pid
+}
+probe scheduler.signal_send
+ = __scheduler.signal_send.tp !,
+   __scheduler.signal_send.kp
+{
+	signal_number = $sig
+}
diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
--- systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/buildok/scheduler-test-tracepoints.stp	2009-09-29 00:15:10.000000000 -0400
@@ -0,0 +1,51 @@
+#! stap -up4
+
+//Tests if all probes in the scheduler tapset are resolvable.
+
+probe scheduler.kthread_stop {
+	printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
+}
+
+probe scheduler.kthread_stop.return {
+        printf("return value = %d\n", return_value);
+}
+
+probe scheduler.wait_task {
+        printf("pid = %d, priority = %d\n", task_pid, task_priority);
+}
+
+probe scheduler.wakeup {
+        printf("pid = %d, priority = %d, state = %d, cpu = %d, tid = %d\n",task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.wakeup_new {
+        printf("pid = %d, priority = %d, state = %d, cpu = %d, tid = %d\n", task_pid, task_priority, task_state, task_cpu, task_tid);
+}
+
+probe scheduler.ctxswitch {
+        printf("prev_pid = %d, prev_priority = %d, prev_state = %d, prev_task_name = %s, prev_tid = %d, next_pid = %d, next_priority = %d, next_state = %d, next_task_name = %s, next_tid = %d\n", prev_pid, prev_priority, prevtsk_state, prev_task_name, prev_tid, next_pid, next_priority, nexttsk_state, next_task_name, next_tid);
+}
+
+probe scheduler.migrate {
+        printf("pid = %d, priority = %d, original cpu = %d destination cpu = %d\n", pid, priority, cpu_from, cpu_to);
+}
+
+probe scheduler.process_free {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_exit {
+        printf("pid = %d, priority = %d\n", pid, priority);
+}
+
+probe scheduler.process_wait {
+        printf("pid = %d\n", pid);
+}
+
+probe scheduler.process_fork {
+        printf("parent pid = %d, child pid = %d\n", parent_pid, child_pid);
+}
+
+probe scheduler.signal_send {
+        printf("pid = %d, signal = %d\n", pid, signal_number);
+}
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.meta	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.meta	2009-09-22 02:18:03.000000000 -0400
@@ -0,0 +1,14 @@
+title: Display the task switches happeningt the scheduler
+name: sched_switch.stp
+version: 1.0
+author: kiran
+keywords: profiling functions
+subsystem: kernel
+status: production
+exit: user-controlled
+output: sorted-list on-exit
+scope: system-wide
+description: The sched_switch.stp script takes two arguments, first argument can be "pid" or "name" to indicate what is being passed as second argument. The script will trace the process based on pid/name and print the scheduler switches happening with the process. If no arguments are passed, it displays all the scheduler switches. This can be used to understand which tasks scheduler the current process being traced, out and when it gets scheduled in again.
+test_check: stap -p4 sched_switch.stp
+test_installcheck: stap  sched_switch.stp -c "sleep 1"
+
diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
--- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
+++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-29 00:27:58.000000000 -0400
@@ -0,0 +1,68 @@
+/* This script wokrs similar to ftrace's sched_switch. It displays a list of
+ * processes which get switched in and out of the scheduler. The format of display
+ * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
+ *    NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME 
+ * -> indicates that prev process is scheduled out and the next process is 
+ *    scheduled in.
+ * + indicates that prev process has woken up the next process.
+ * The usage is sched_switch.stp <"pid"/"name"> pid/name
+ */
+
+function state_calc(state) {
+        if(state == 0)
+        status = "R"
+        if(state == 1)
+        status = "S"
+        if(state == 2)
+        status = "D"
+        if(state == 4)
+        status = "T"
+        if(state == 8)
+        status = "T"
+        if(state == 16)
+        status = "Z"
+        if(state == 32)
+        status = "EXIT_DEAD"
+        return status
+}
+probe scheduler.wakeup
+{
+	 %( $# == 2 %?
+
+        if(@1 == "pid")
+                if (task_tid != $2 && task_tid(task_current()) != $2)
+                        next
+        if(@1 == "name")
+                if (task_execname(task) != @2 && execname() != @2)
+                        next
+
+        
+                if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
+                   (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))
+                        printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
+                                execname(), task_cpu(task), gettimeofday_ns(),
+                                task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
+                                task_tid(task), task_prio(task), state_calc(task_state(task)),
+                                task_execname(task))
+         %)
+
+
+	
+}
+probe scheduler.ctxswitch
+{
+	%( $# == 2 %?
+	
+	if(@1 == "pid") 
+		if (next_tid != $2 && prev_tid != $2)
+			next
+	if(@1 == "name")
+		if (prev_task_name != @2 && next_task_name != @2)
+               		next 
+	
+	%)
+
+	printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
+		task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
+		next_priority,state_calc(nexttsk_state),next_task_name)
+}


Output
------
Output of the sched_switch name kjournald

syslogd         2       1254199017432933355     3395:20:R + 1338:15:S kjournald
syslogd         2       1254199017432952481     3395:120:D ==> 1338:115:R kjournald
kjournald               2       1254199017433168831     1338:115:D ==> 3398:120:R klogd
swapper         2       1254199017445924281     0:40:R + 1338:15:D kjournald
swapper         2       1254199017445949512     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017446019858     1338:15:R + 423:15:S kblockd/2
kjournald               2       1254199017446035171     1338:115:D ==> 423:115:R kblockd/2
swapper         2       1254199017451967116     0:40:R + 1338:15:D kjournald
swapper         2       1254199017451989107     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017452027795     1338:15:R + 3395:20:D syslogd
kjournald               2       1254199017452042046     1338:115:S ==> 3395:120:R syslogd
syslogd         2       1254199017461421370     3395:20:R + 1338:15:S kjournald
syslogd         2       1254199017461435964     3395:120:D ==> 1338:115:R kjournald
kjournald               2       1254199017461480827     1338:115:D ==> 3398:120:R klogd
swapper         2       1254199017476086266     0:40:R + 1338:15:D kjournald
swapper         2       1254199017476107523     0:140:R ==> 1338:115:R kjournald
kjournald               2       1254199017476137236     1338:115:D ==> 0:140:R swapper
swapper         2       1254199017482149965     0:40:R + 1338:15:D kjournald
-------------------------------------------------------------------------------------------------

Output of the scheduler-test-tracepoints.stp testcase

prev_pid = 29148, prev_priority = 120, prev_state = 1, prev_task_name = stapio, prev_tid = 29148, next_pid = 0, next_priority = 140, next_state = 0, next_task_name = swapper, next_tid = 0
pid = 3398, priority = 120, original cpu = 2 destination cpu = 3
pid = 3398, priority = 120, state = 1, cpu = 3, tid = 3398
prev_pid = 0, prev_priority = 140, prev_state = 0, prev_task_name = swapper, prev_tid = 0, next_pid = 3398, next_priority = 120, next_state = 0, next_task_name = klogd, next_tid = 3398






-- 

Thanks,
Kiran


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]