This is the mail archive of the systemtap@sourceware.org mailing list for the systemtap project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: System call instrumentation


* Arjan van de Ven (arjan@infradead.org) wrote:
> On Mon, 19 May 2008 23:44:53 -0400
> Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> 
> > * Ingo Molnar (mingo@elte.hu) wrote:
> > > 
> > > * Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> wrote:
> > > 
> > > > Ideally, I'd like to have this kind of high-level information :
> > > > 
> > > > event name : kernel syscall
> > > > syscall name : open
> > > > arg1 (%s) : "somefile"    <-----
> > > > arg2 (%d) : flags
> > > > arg3 (%d) : mode
> > > > 
> > > > However, "somefile" has to be read from userspace. With the
> > > > protection involved, it would cause a performance impact to read
> > > > it a second time rather than tracing the string once it's been
> > > > copied to kernel-space.
> 
> the audit subsystem already does all of this... why not use that??
> (And it goes through great lengths to do it securely)
> 
> > > 
> 
> > Hrm, a quick benchmark on my pentium 4 comparing a normal open()
> > system call executed in a loop to a modified open() syscall which
> > executes the lines added in the following patch adds 450 cycles to
> > each open() system call. I added a putname/getname on purpose to see
> > the cost of a second userspace copy and it's not exactly free.
> 
> copying twice does mean that if the user wants, he can cheat you. He
> can, in another thread, change the string under you. So say you're
> doing this for anti-virus purposes, he can make you scan one file and
> open another.
> 
> 
> The audit subsystem was carefully designed to avoid this trap... how
> about using that?

Hrm, given tracing will have to grab __user * parameters passed to
various system calls, not limited to strings, the getname/putname
infrastructure would need to be expanded a lot. I doubt it's worth
adding such complexity (copy to temporary memory buffers and reference
counting) in those system calls to support kernel-wide tracing.

On the other hand, adding a marker in the traced function, at a code
location where the data copied into the kernel is accessible, won't add
such complexity and will help to keep good locality of reference (the
stack is meant to be a good cache-hot memory region). Because a dormant
marker does not have a significant performance hit (actually, my
benchmarks shows a small acceleration of the overall system, probably
due to cache line code layout modifications), I think it's legitimate to
add this kind of instrumentation in the existing kernel system call
functions.

Mathieu

-- 
Mathieu Desnoyers
OpenPGP key fingerprint: 8CD5 52C3 8E3C 4140 715F  BA06 3F25 A8FE 3BAE 9A68


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]