This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] Scheduler Tapset based on kernel tracepoints


Hi,

On 09/23/2009 07:13 AM, Kiran wrote:
> +probe scheduler.ctxswitch.tp = kernel.trace("sched_switch") 

These .tp and .kp variants are an internal detail -- please put them
under __scheduler so they are in a more private namespace from what's
exposed to the user.

> +/**
> + * probe scheduler.migrate_task - Traces the migration of the tasks across cpus by the scheduler.
> + * @pid: pid of the task being migrated.
> + * @priority: priority of the task being migrated.
> + * @original_cpu: the original cpu
> + * @destination_cpu: the destination cpu
> + */

There is already a "scheduler.migrate" defined, so please merge with
that instead of creating a new one.

> +/**
> + * probe scheduler.process_exit - Fires when a process exits
> + * @pid: pid of the process exiting
> + * @priority: priority of the process exiting
> + */

This overlaps with the less well-known "kprocess.exit".  I think in this
case we can make the kprocess one point to yours.

> +/**
> + * probe scheduler.process_fork - Probes the tracepoint for forking a process
> + * @parent_pid: PID of the parent process
> + * @child_pid: PID of the child process
> + */

The "kprocess.create" is also similar to this.

> +/**
> + * probe scheduler.signal_send - Probes the tracepoint for sending a signal
> + * @pid: pid of the process sending signal
> + * @signal_number: signal number
> + */

Here we have the whole signal.stp tapset, which includes a "signal.send"
probe point.

It's not easy to decide how to resolve the overlaps in different
tapsets, but please consider it.  At a minimum, they should leverage
each other so they're all using the best available probe points.


> diff -Naur systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp
> --- systemtap-orig/testsuite/systemtap.examples/profiling/sched_switch.stp	1969-12-31 19:00:00.000000000 -0500
> +++ systemtap/testsuite/systemtap.examples/profiling/sched_switch.stp	2009-09-22 02:29:16.000000000 -0400
> [...]
> +probe scheduler.wakeup
> +{
> +	pids[task_pid]++
> +	processes[task_pid] = $p;
> +	prev[task_pid] = task_current()
> +	
> +}

If this is just so you can print the "+" line of who wokeup whom, why
not print that right away?  Then you don't need those arrays at all.
Saving task_struct pointers is messy business.

> +probe scheduler.ctxswitch
> +{
> +	tid = next_tid
> +	tid1 = prev_tid
> +	state = prev_state
> +	state1 = next_state

Why copy these values?  The prev/next variable names already reflect
more meaning.

> +	
> +	%( $# == 2 %?
> +	
> +	if(@1 == "pid") 
> +		if (tid != $2 && tid1 != $2)
> +			next
> +	if(@1 == "name")
> +		if (task_execname(task_current()) != @2 && task_execname($next) != @2)
> +               		next 

Note that task_execname(task_current()) is the same as simply
execname().  But anyway, your tapset provides prev_task_name and
next_task_name, so just use those.

> +	
> +	foreach (name in pids-) {
> +		if ((@1 == "pid" && (name == $2 || task_pid(prev[name]) == $2)) || 
> +		   (@1 == "name" && (task_execname(prev[name]) == @2 || task_execname(processes[name]) == @2)))
> +			printf("%s\t\t%d\t%d\t%d\t%d:%d:%s + %d:%d:%s %s\n",
> +				task_execname(prev[name]), task_pid(prev[name]), task_cpu(processes[name]), gettimeofday_ns(), 
> +				task_pid(prev[name]), task_prio(prev[name]), state_calc(task_state(prev[name])), 
> +				task_pid(processes[name]), task_prio(processes[name]), state_calc(task_state(processes[name])), 
> +				task_execname(processes[name]))
> +	} %)

Hmm, "name" is not a name, which is confusing.  But if you're looking
for the task which woke this one up, wouldn't that be prev[next_pid]?
(That should really be indexed by the tid, actually.)  Or, as I said
above, it may be easier to print this right in the wakeup probe.

> +
> +	old_cpu = task_cpu_old[tid]
> +	printf("%s\t\t%d\t%d\t%d\t%d:%d:%s ==> %d:%d:%s %s\n",task_execname(task_current()),tid1,
> +		old_cpu,gettimeofday_ns(),tid1,task_prio(task_current()),state_calc(state),next_pid,
> +		next_prio,state_calc(next_state),next_task_name )
> +	task_cpu_old[next_tid] = cpu()
> +}

I think task_current() is probably always the same as prev_task, but
it's probably best to use the latter.  It would also be helpful if you
spaced this out so we can better see what's being printed.  I can see
two "tid1"s in there, which is a bit weird.

Thanks,

Josh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]