This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: Evaluating SystemTap for Network Response Times
- From: fche at redhat dot com (Frank Ch. Eigler)
- To: Nathan DeBardeleben <ndebard at lanl dot gov>
- Cc: systemtap at sources dot redhat dot com
- Date: 31 Jan 2006 12:21:29 -0500
- Subject: Re: Evaluating SystemTap for Network Response Times
- References: <43DF95CB.8070201@lanl.gov>
Nathan DeBardeleben <ndebard@lanl.gov> writes:
> [...] Specifically, we want to time the point
> when a socket send operation leaves user space, entering kernel space,
> down to the point where the kernel says "it's done, sent". [...]
>
> Initially this looks just like the kind of thing I could do with
> SystemTap but I worry that the scripting language will be too
> restrictive to allow me to allocate these types of data structures
> to do record keeping.
I hope it is exactly this kind of complex instrumentation with which
systemtap could show its prowess. I would like to help you make it
work.
> When it comes down to it - I want to observe a system and recognize
> outliers ("hey, this operation took 20 times longer than the rest")
> through statistical means.
Expressing that condition should be no problem at all. If for example
you elect to use a statistics value to store elapsed times
times <<< time /* or an array indexed however necessary */
then a probe can compare the current average to a new value like this:
if (@avg(times) > EXPR) { /* process further */ }
Over time, I foresee the variety of statistical calculations growing
to include goodies like standard deviations, random sampling, and
whatever else can be efficiently computed per-CPU and then aggregated
across CPUs.
> [...] I hope I can add some value to the SystemTap community by
> testing it out in these environments. If this first step goes well,
> I will be looking at using SystemTap for monitoring parallel file
> systems and studying potential performance bottlenecks.
That all sounds great.
- FChE