This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [Ltt-dev] patches to actually use markers?

* David Smith ( wrote:
> Mathieu Desnoyers wrote:
> > * David Smith ( wrote:
> >>>> Mathieu
> >> I've been looking at your system call tracing patches.  (I've tried
> >> running lttv itself without much luck, but it doesn't really matter for
> >> the sake of this discussion.)
> >>
> >> I like the way you use the existing system call tracing points.  So
> >> we're on the same page, here are the markers I'm seeing in
> >> arch/x86/kernel/ptrace32.c after applying
> >> patch-2.6.24-rc2-lttng-0.10-pre23.tar.bz2:
> >>
> >>   trace_mark(kernel_arch_syscall_entry, "syscall_id %d ip #p%ld",
> >> 			(int)regs->orig_eax, instruction_pointer(regs));
> >>
> >>   trace_mark(kernel_arch_syscall_exit, MARK_NOARGS);
> >>
> >> For systemtap use, we'd like to have more information than that.  On
> >> syscall entry, we'd like be able to get the arguments,  On syscall exit,
> >> we'd like the to be able to get the return value.  In fact, the easiest
> >> thing would be to supply the same information that audit_syscall_entry()
> >> and audit_syscall_exit() need.
> >>
> >> Since I'll bet you've already considered this, I'd like to know why you
> >> decided to go a different way.
> >>
> > Well, the approach taken was to instrument each important system call in
> > the syscall specific function to be able to actually know what type of
> > information to record. For instance, if ebx points to a string, the
> > pointer is not very useful, but the string is.
> That is (somewhat) true in the case of strings.
> But, similar problems exist with syscalls that take structure pointers:
> sys_[gs]ettimeofday, sys_adjtimex, sys_times, sys_nanosleep,
> sys_[gs]etitimer, sys_timer_create, sys_timer_[gs]ettime,
> sys_clock_gettime, sys_clock_getres, sys_clock_nanosleep,
> sys_sched_setscheduler, sys_sched_[gs]etparam, sys_wait4, sys_waitid,
> sys_rt_sigtimedwait, sys_stat, sys_statfs[64], sys_fstatfs[64],
> sys_lstat, sys_fstat, and so on (I got tired of looking through syscalls.h).
> For those syscalls only a pointer can be passed so the marker handler
> will have to know how to handle that pointer.  That marker handler will
> need to know that that value is a pointer to a particular structure type
> and then know how to access it accordingly.
> The same could be done for strings.  Is it a little more work?  Yes.  Is
> it fairly easy?  Yes.
> Let me ask the question another way.  Is there a (measurable)
> performance hit if the extra arguments to the syscall entry marker are
> added?  If not, even if lttng doesn't plan to use them, why not add
> them?  Certainly systemtap (and perhaps other users) could use them.

Yup, I'd be all in for flexibility, and the performance impact should be
small. I just wonder if the best approach is to pass the pt_regs pointer
as a marker argument or to pass the individual registers.

Since the LTTng serializer uses the format string to generically take
the arguments and write them in a trace, I doubt that writing a pt_regs
pointer is really useful. On the other hand, passing all the individual
registers would imply a stack setup cost at runtime (small cost though),
but would provide somewhat meaningful information in the traces (but
redundant if we instrument the in-kernel functions).

Both approaches would let specific probes deal with the syscall
arguments as they like.

If we choose to go for the pt_regs pointer passing solution, we could
add a format string extension to specify that a given argument should
not be written in the trace. If we pass the pt_regs like this :

  trace_mark(syscall_entry, "syscall_id %lu ip %p pt_regs #0%p",
    regs->eax, instruction_pointer(regs), regs);

A LTTng probe would know that the #0 (# is a prefix to the format
string element that tells LTTng what type size and format to use in the
trace, independent of the size used on the gcc side) means that the data
should be discarded from the trace.

My goal is still that adding instrumentation should be as easy as
possible in the general case, while permitting flexibility for custom
probes. Therefore, I'd prefer not to _require_ the implementation of
a syscall audit-like set of per-architecture probes, but I'd like to
leave room to implement one.


> > You have a good point for the syscall exit instrumentation : adding the
> > return value is trivial and would be very useful.
> I'm glad we agree that adding the return value is useful and trivial.
> -- 
> David Smith
> Red Hat
> 256.217.0141 (direct)
> 256.837.0057 (fax)
> _______________________________________________
> Ltt-dev mailing list

Mathieu Desnoyers
Computer Engineering Ph.D. Student, Ecole Polytechnique de Montreal
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]