This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
- From: fche at redhat dot com (Frank Ch. Eigler)
- To: Kiran <kiran at linux dot vnet dot ibm dot com>
- Cc: systemtap at sources dot redhat dot com
- Date: Fri, 25 Sep 2009 16:20:40 -0400
- Subject: Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
- References: <1253715217.5193.115.camel@kiran-laptop>
Kiran <kiran@linux.vnet.ibm.com> writes:
(Is that your full name, for purposes of being listed in AUTHORS?)
> [...]
> --- systemtap-orig/tapset/scheduler.stp 2009-09-19 10:27:14.000000000 -0400
> +++ systemtap/tapset/scheduler.stp 2009-09-22 07:32:04.000000000 -0400
> @@ -118,32 +118,280 @@
> * Arguments:
> * prev_pid: The pid of the process to be switched out
> * next_pid: The pid of the process to be switched in
> + * prev_tid: The tid of the process to be switched out
> + * next_tid: The tid of the process to be switched in
> + * prev_task_name: The name of the process to be switched out
> + * next_task_name: The name of the process to be switched in
> * prevtsk_state: the state of the process to be switched out
> + * nexttsk_state: the state of the process to be switched in
> */
Please recast these as kerneldoc-formatted embedded docs that can
be extracted into our man pages.
> -probe scheduler.ctxswitch =
> +
> +probe scheduler.ctxswitch.tp = kernel.trace("sched_switch")
> +{
> + next_pid = $next->pid
> + next_pid = task_tid($next)
> + next_task = $next
> + next_task_name = task_execname($next)
> + nexttsk_state = $next->state
> + prev_pid = $prev->pid
> + prev_tid = task_tid(tid)
> + prev_task = $prev
> + prev_task_name = task_execname($prev)
> + prevtsk_state = $prev->state
> +}
There's a consistent confusion between pid and tid here.
pid is $task->tgid; tid is $task->pid. (There is not much
need for using task_tid() function, certainly not in this
asymmetric way.)
> +/**
> + * probe scheduler.kthread_stop - Fires when a thread created by kthread_create is stopped.
> + * @thread_pid: pid of the thread being stopped.
> + * @thread_priority: priority of the thread.
> + */
> +probe scheduler.kthread_stop.kp = kernel.function("kthread_stop")
> +{
> + thread_pid = $k->pid
> + thread_priority = $k->priority
> +}
> +probe scheduler.kthread_stop.tp = kernel.trace("sched_kthread_stop")
> +{
> + thread_pid = $t->pid
> + thread_priority = $t->prio
> +}
> +probe scheduler.kthread_stop
> + = scheduler.kthread_stop.tp !,
> + scheduler.kthread_stop.kp
> +{}
This one looks just right!
> diff -Naur systemtap-orig/testsuite/buildok/scheduler-test-tracepoints.stp systemtap/testsuite/buildok/scheduler-test-tracepoints.stp
> +#! stap -up4
> +
> +//Tests if all probes in the scheduler tapset are resolvable.
> +
> +probe scheduler.kthread_stop {
> + printf("pid = %d, priority = %d\n", thread_pid, thread_priority);
> +}
> [...]
These look good (if they print out all the variables).
> [...]
> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp 1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp 2009-09-22 02:29:16.000000000 -0400
> @@ -0,0 +1,71 @@
> +/* This script wokrs similar to ftrace's sched_switch. It displays a list of
> + * processes which get switched in and out of the scheduler. The format of display
> + * is PROCESS_NAME PROCESS_PID CPU TIMESTAMP PID: PRIORITY: PROCESS STATE ->/+
> + * NEXT_PID : NEXT_PRIORITY: NEXT_STATE NEXT_PROCESS_NAME
> + * -> indicates that prev process is scheduled out and the next process is
> + * scheduled in.
> + * + indicates that prev process has woken up the next process.
> + * The usage is sched_switch.stp <"pid"/"name"> pid/name
> + */
> +
> +global task_cpu_old[9999]
> +global pids[999]
> [...]
Please write down whatever rationale exists for those two numbers.
Perhaps they could stick with defaults after the pid/tid confusion is
cleared up, and if perhaps you add some code to delete entries
belonging to processes as they die.
It'd be good to see a statement of what OS/kernel versions this
was tested on, and to see a sample of its output.
Thank you for your efforts!
- FChE