This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints

From: Josh Stone <jistone at redhat dot com>
To: Kiran <kiran at linux dot vnet dot ibm dot com>
Cc: "Frank Ch. Eigler" <fche at redhat dot com>, systemtap at sources dot redhat dot com
Date: Tue, 29 Sep 2009 13:18:58 -0700
Subject: Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
References: <1253715217.5193.115.camel@kiran-laptop> <4ABD49B7.60105@redhat.com> <1254214488.5689.114.camel@kiran-laptop>

On 09/29/2009 01:54 AM, Kiran wrote:
> Hi,
> 
> I have attached the next version of the patch. This incorporates most of
> the comments given by Josh and Frank. This was tested on mainline linux
> kernel 2.6.31 on ppc64 and x86_64. I have attached sample output for
> ppc64.
> 
> Frank,
> My full name is Kiran Prakash 
> 
> Josh,
> I was not able to merge the signal and kprocess tapsets with my
> scheduler tapset as my scheduler tapset probes the tracepoints mainly
> and in case tracepoints are not present, it probes the kernel functions.
> So I can define namespaces only for the variables exported by the
> tracepoint.

Understood, thanks for looking.

Frank's comment on mixing up pid/tid still needs to be addressed, e.g.:

> +probe scheduler.wakeup
> + = kernel.trace("sched_wakeup") !,
> +   kernel.function("try_to_wake_up")
> +{
> +	task = $p
> +        task_pid = $p->pid
> +	task_tid = task_tid($p)
> +        task_priority = $p->prio
> +	task_cpu = task_cpu($p)
> +	task_state = task_state($p)
> +}

Note that $p->pid is identical to task_tid($p).  What we really want
here and everywhere else is:

  task_pid = $p->tgid
  task_tid = $p->pid

(Also a formatting nitpick -- there's a mix of tabs and spaces in here
that should be more consistent.)

> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-29 00:27:58.000000000 -0400
> @@ -0,0 +1,68 @@
> +/* This script wokrs similar to ftrace's sched_switch. It displays a list of

s/wokrs/works/

> +probe scheduler.wakeup
> +{
> +	 %( $# == 2 %?
> +
> +        if(@1 == "pid")
> +                if (task_tid != $2 && task_tid(task_current()) != $2)
> +                        next

task_tid(task_current()) => tid()

However, if the filter is named "pid", shouldn't the comparison be done
by pid instead of tid?

> +        if(@1 == "name")
> +                if (task_execname(task) != @2 && execname() != @2)
> +                        next
> +
> +        
> +                if ((@1 == "pid" && (task_tid == $2 || task_pid(task_current()) == $2)) ||
> +                   (@1 == "name" && (task_execname(task_current()) == @2 || task_execname(task) == @2)))

This condition appears to already be guaranteed by the two filter checks
above.

> +                        printf("%s\t\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +                                execname(), task_cpu(task), gettimeofday_ns(),
> +                                task_tid(task_current()), task_prio(task_current()), state_calc(task_state(task_current())),
> +                                task_tid(task), task_prio(task), state_calc(task_state(task)),
> +                                task_execname(task))
> +         %)

The way you've placed this %) means that nothing will print if no filter
arguments were used.  Is that intentional?  At least, it's not
consistent with the printing in your scheduler.ctxswitch probe...

> +probe scheduler.ctxswitch
> +{
> +	%( $# == 2 %?
> +	
> +	if(@1 == "pid") 
> +		if (next_tid != $2 && prev_tid != $2)
> +			next

This has the same confusion of reading "pid" but comparing tids...

> +	printf("%s\t\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",prev_task_name,
> +		task_cpu(prev_task),gettimeofday_ns(),prev_tid,prev_priority,state_calc(prevtsk_state),next_tid,
> +		next_priority,state_calc(nexttsk_state),next_task_name)
[...]
> swapper         2       1254199017476107523     0:140:R ==> 1338:115:R kjournald
> kjournald               2       1254199017476137236     1338:115:D ==> 0:140:R swapper
> swapper         2       1254199017482149965     0:40:R + 1338:15:D kjournald

Are the tabs significant to you for parsing this output?  Visually, this
would line up better if you used field widths for spacing.
(e.g. %-16s for execnames).

Thanks,

Josh

References:
- [PATCH v2] Scheduler Tapset based on kernel tracepoints
  - From: Kiran
- Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints
  - From: Josh Stone
- Re: [PATCH v3] Scheduler Tapset based on kernel tracepoints
  - From: Kiran

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]