This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
On 09/29/2009 01:54 AM, Kiran wrote:
> Hi,
>
> I have attached the next version of the patch. This incorporates most of
> the comments given by Josh and Frank. This was tested on mainline linux
> kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
> ppc64.
>
> Frank,
> My full name is Kiran Prakash
>
> Josh,
> I was not able to merge the signal and kprocess tapsets with my
> scheduler tapset as my scheduler tapset probes the tracepoints mainly
> and in case tracepoints are not present, it probes the kernel functions.
> So I can define namespaces only for the variables exported by the
> tracepoint.
Understood, thanks for looking.
Frank's comment on mixing up pid/tid still needs to be addressed, e.g.:
> +probe scheduler.wakeup
> + = kernel.trace("sched_wakeup") !,
> + kernel.function("try_to_wake_up")
> +{
> + task = $p
> + task_pid = $p->pid
> + task_tid = task_tid($p)
> + task_priority = $p->prio
> + task_cpu = task_cpu($p)
> + task_state = task_state($p)
> +}
Note that $p->pid is identical to task_tid($p). What we really want
here and everywhere else is:
task_pid = $p->tgid
task_tid = $p->pid
(Also a formatting nitpick -- there's a mix of tabs and spaces in here
that should be more consistent.)
> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp 1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp 2009-09-29 00:27:58.000000000 -0400
> @@ -0,0 +1,68 @@
> +/* This script wokrs similar to ftrace's sched_switch. It displays a list of
s/wokrs/works/
> +probe scheduler.wakeup
> +{
> + %( $# == 2 %?
> +
> + if(@1 == "pid")
> + if (task_tid != $2 && task_tid(task_current()) != $2)
> + next
task_tid(task_current()) => tid()
However, if the filter is named "pid", shouldn't the comparison be done
by pid instead of tid?
> + if(@1 == "name")
> + if (task_execname(task) != @2 && execname() != @2)
> + next
> +
> +
> + if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
> + (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))
This condition appears to already be guaranteed by the two filter checks
above.
> + printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> + execname(), task_cpu(task), gettimeofday_ns(),
> + task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
> + task_tid(task), task_prio(task), state_calc(task_state(task)),
> + task_execname(task))
> + %)
The way you've placed this %) means that nothing will print if no filter
arguments were used. Is that intentional? At least, it's not
consistent with the printing in your scheduler.ctxswitch probe...
> +probe scheduler.ctxswitch
> +{
> + %( $# == 2 %?
> +
> + if(@1 == "pid")
> + if (next_tid != $2 && prev_tid != $2)
> + next
This has the same confusion of reading "pid" but comparing tids...
> + printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
> + task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
> + next_priority,state_calc(nexttsk_state),next_task_name)
[...]
> swapper 2 1254199017476107523 0:140:R ==> 1338:115:R kjournald
> kjournald 2 1254199017476137236 1338:115:D ==> 0:140:R swapper
> swapper 2 1254199017482149965 0:40:R + 1338:15:D kjournald
Are the tabs significant to you for parsing this output? Visually, this
would line up better if you used field widths for spacing.
(e.g. %-16s for execnames).
Thanks,
Josh