Red Hat January 27, 2010 Webinar

A recording of the presentation is available on WebEx (need a java enabled browser to view).

The following are Questions and Answers from the webinar.

Q: For which Linuxes is SystemTap available besides RHEL? E.g. Fedora, or other distros?

SystemTap was first available in Fedora Core 4 and is available in all Fedora release since. Other Linux distributions include SystemTap. Instructions on installing SystemTap on various Linux distributions is available in the Installation section of the SystemTap wiki:

Q: Will you compare/contrast SystemTap to Solaris DTrace? Anything alike at all?

SystemTap is of a similar scope and ambition as Solaris DTrace: both are script-based, and system-wide. There are numerous implementation and deployment differences. SystemTap tries to address the larger area of symbolic debugging. See also:

http://sourceware.org/systemtap/wiki/SystemtapDtraceComparison

Q: Does SystemTap have anything similar to Solaris DTrace aggregations?

Certainly. Contention-free aggregations are available in the scripting language.

Q: Do you consider SystemTap safe to use in a production environment?

Our advice generally is to test scripts in a development environment first, then deploy. Many people have used SystemTap in production machines.

Q: What about interrogating blacklisted (risky functions) such as IRQ handlers, spinlocks/semaphores etc?

They are risky, which is why the blacklist is there. In guru mode (stap "-g" option), some blacklists are disabled, but use strictly at your own risk.

Q: How do you get the list of helper functions like ansi_clear_screen?

A reference manual describing the various functions and tapset is installed by the SystemTap RPM at:

/usr/share/doc/systemtap-1.1/tapsets/index.html

There are a variety of man pages in the "3stap" section, which are also listed here:

http://sourceware.org/systemtap/man/index.html

Work is continuing to document every tapset and function, but there are still some functions that are not documented. One may examine the tapset library itself, under /usr/share/systemtap/tapset/ for the code that SystemTap actually uses.

Q: Thought on Linus' adversion to SystemTap? See http://osdir.com/ml/utrace-devel/2010-01/msg00254.html

We are optimistic that certain kernel developers will come to appreciate the unique capabilities of the system, and forgive its unusual implementation technique.

In the mean time, users can rest easy that SystemTap has always operated in a loosely coupled way from the development of the kernel. It will continue to operate fine, even if pieces of it turn out never to be merged. Red Hat and partners are committed to continuing SystemTap for the foreseeable future.

Q: What is SystemTap's availability / support in RHEL4 ?

SystemTap has been available with RHEL4 for several years now. Most features work; user-space probing cannot (due to missing kernel elements). SystemTap 0.6.2 is the most current version in RHEL4 and we are contemplating refreshing it again before RHEL4 is retired.

Q: Do you plan to have also a website with SystemTap examples (user-contributed)?

We welcome SystemTap samples from anyone:

and to put into our distribution:

Q: How do I get IO from block devices?

We have several examples associated with tracing block device layer events.

http://sourceware.org/systemtap/examples/keyword-index.html#IO

Q: Is there a way to get the debuginfo RPMs via yum?

There are three ways to get the debuginfo rpms via yum:

1. Use debuginfo-install utility from the yum-utils RPM. SystemTap 0.9.8 and newer will suggest the command line if the debuginfo for a package is not found. For example to install the debuginfo for gcc run the following as root:

2. Use --enablerepo="*debuginfo*" on the yum command line. For example to install the gcc debuginfo run the following as root:

3. Edit /etc/yum.repos.d/rhel-debuginfo.repo and change enabled from 0 to 1. Then install the gcc debuginfo run the following as root:

Q: How large are those variables, for example, how do you know if/when they have overflowed?

Integer variables are 64-bit signed numbers. Strings have a configurable maximum width. Both "overflow" silently. Associative arrays have a configurable maximum size, and "overflow" with a clean runtime error message.

Q: What is SystemTap's impact on performance? Is it just the runtime of the compiled script or is additional overhead added?

There are several components of overhead. There is a fixed amount of kernel memory (which is reported via printk to a syslogd) used. There may be timekeeping-related kernel threads running. Plus each hit of a probe event takes an amount of time that depends on the type of event, but generally events take on the order of a microsecond of overhead, plus an amount of time proportional to the script code being executed. The latter amount is estimated in a report if stap is run in "-t" timing mode.

The dominant factor appears to be the rate of probe event hits. If less than on the order of 10**5 Hz, there should generally be no noticeable impact.

Q: Are there performance implications running SystemTap for long periods?

Not really. Memory consumption is strictly limited.

Q: What techniques minimize the performance hit when tracing? Buffering raw data for instance, and format later?

Yes. Some techniques include: . deferring backtrace address-to-symbol mappings - backtrace() vs print_backtrace() . using flight recorder mode to avoid constant userspace transmission . using binary formatted data to compact traces (printf %*b)

Q: Is SystemTap a suitable tool for monitoring or is it suitable for 'debugging' and problem analysis?

We believe it is a good tool for both these purposes. SystemTap's scope touches symbolic source-level debugging.

Q: Is there a tapset for syscall entry vs. return? I see syscall.X.return, but no syscall.X.entry.

The entry probe would just be called syscall.X in the tapset.

Q: Does SystemTap support multiple dimensional array?

Certainly, with index tuples containing up to nine integer or string dimensions. You can iterate, delete, whatever you'd do in a real programming language, but subject to constraints associated with our simplified concurrency model.

Q: How would you display the values of variables passed to a function, and the values returned by that function?

At a function entry probe, use the $$parms variable to produce a dump of all parameter values. Individual variables are typically available as $foo, $bar for a function declared taking (foo, bar) arguments. These values may be dereferenced if they are typed pointers.

Return values are available at a function(...).return probe, under the name $return, for non-void functions, or as stringified $$return for all .return probes.

Q: Does SystemTap support user-space app such as Java?

Yes, some user-space application supported support probing; more details are available at:

http://fedoraproject.org/wiki/Features/SystemtapStaticProbes

Some recent versions of icedtea include <sdt.h> markers that allow SystemTap to probe VM-level events such as method entries and exits. On-the-fly java backtracing has been prototyped. More details on Java probing are available at:

Q: How do we add user-space markers to our user-space applications?

There is a writeup "Adding User Space Probing to an Application" describing the process at:

http://sourceware.org/systemtap/wiki/AddingUserSpaceProbingToApps

Q: Do you have an example that allows you to:

That could be could easily used to trace down into the functions that use the most time.

Thank you for also asking this on the mailing list. There isn't a perfectly suited script for this, but there are some starting steps in this email thread in the archive:

http://sourceware.org/ml/systemtap/2010-q1/msg00277.html

Q: Is there an easy way to use SystemTap for troubleshooting transient performance problems?

We are exploring a more integrated 'health monitoring' approach (http://sources.redhat.com/bugzilla/show_bug.cgi?id=10691) for such issues. In the mean time, one may run any tracing type SystemTap script in flight-recorder mode ("-F" option), and retrieve the stored recent data only after one notices a problem.

Q: Are there instances where SystemTap would be better to use than OProfile, and vice versa?

OProfile permits sampling-based profiling associated with any given hardware performance counter. SystemTap does not currently interface to these same counters, but work is under way (http://sources.redhat.com/bugzilla/show_bug.cgi?id=909). In the mean time, the SystemTap "timer.profile" probe is the closest method for sampling-based data gathering. See

http://sourceware.org/systemtap/examples/process/pf2.stp

Q: Don't both OProfile and SystemTap have troubles gathering information on the dark side of interrupts?

Interrupt handlers within device drivers are fair game for probing with SystemTap, but several low level kernel interrupt-related areas are blacklisted from probing. But you're right, there are several shortcomings.

When OProfile uses the performance counters and the NMI interrupt mechanism it can collect samples within irq masked areas.

Q: If you wanted to tie all NFS operations to processes on a system, what process would you follow to determine the taps to enable and how to tie those back to the running processes?

Putting kprobes into the nfs modules, and monitoring pid() / execname() would get you that relationship. Something like the following short script:

probe module("nfs").function("nfs_*") {
  printf("%s(%d): %s()\n", execname(), pid(), probefunc());
}

There is also a couple NFS examples checked into the SystemTap examples (http://sourceware.org/systemtap/examples/): nfs_func_log.stp and io/nfs_func_users.stp.

Q: Is it possible to list the device info in the iostat.stp?

The iostats.stp script works at the level of system calls, and device information is not available at the system call level. However, at the lower level vfs.read and vfs.write the device number (dev) is available. The iostats.stp script can be adapted to probe those lower-level functions. The iodevstats.stp checked into the SystemTap examples (http://sourceware.org/systemtap/examples/) shows one possible way to collect this data

Q: We are having some issues with sshd, it looks like some particular pam.d conf file was not read. Is there any SystemTap script available to check what pam modules are being called?

There isn't a script specifically for that, but two possible approaches would be:

  1. trace open syscalls made by your sshd process, and record those that match /lib/security/*.
  2. put a probe into likbpam.so.0 itself where it lots pam configuration and module files

Q: Is there an algebra for SystemTap expressions - i.e. do an action if time in a function is greater than xxxusec??

From first principles:

global time probe FOO.function("bar").call {
  time[tid()] = gettimeofday_us()
}

probe FOO.function("bar").return {
  if (gettimeofday_us() - time[tid()] > THRESHOLD) { ... }
}

Q: What is the difference between tracepoints and probepoints?

Kernel tracepoints are a particular hook-insertion mechanism compiled into some modern kernels (first available in linux-2.6.28). SystemTap probe points are a naming scheme for abstract events that may be associated with timers, or callbacks from hooking mechanisms such as tracepoints, kprobes, uprobes, etc.

None: RH2010Webinar (last edited 2010-02-03 23:02:08 by WilliamCohen)