Understanding skipped probes

Systemtap probe handlers may be skipped sometimes. When enough of them do so, a script exits.

  WARNING: Number of errors: 0, skipped probes: 100

There are many different reasons. As a first step for debugging, rerun your script with the -t (timing) flag. Listed below are the common possibilities. (If you see a different one, please let us know so we can update this wiki page.)

The number of skipped probes that trigger an overall script error exit is governed by the -DMAXSKIPPED=nnn parameter. You may increase that number dramatically and try running the script again.

Skipped due to global 'VAR' lock timeout

The problem is excessive contention for a script-level "global" variable: where too many concurrently running probe handlers are trying to modify the same global(s). Each new probe handler waits up to a limited amount of time for the locks to be released; otherwise the probe is skipped. This can sometimes be worked around by optimizing script code (see TipSkippedProbesOptimization), or by enlarging the -DTRYLOCKDELAY=mmm and -DMAXTRYLOCK=nnn parameters.

Skipped due to low stack

Some probes were triggered in a context where too little kernel stack was available to safely attempt execution of the handlers. For example, on architectures that don't allocate separate exception stacks, kprobes-based probes may be placed in deeply nested kernel contexts. The amount minimum free kernel stack space at probe entry is about -DMINSTACKSPACE=nnn bytes. You could try to reduce this amount below its default (slightly and carefully), or place your probes higher up in the call stack.

Skipped due to reentrancy

Most probe handlers run in -DINTERRUPTIBLE=1 mode by default. This means that hardware interrupts may occur while a probe handler is run. If those interrupt handlers are in turn instrumented somehow, then the second systemtap probe could try to be invoked while the first one is still active. This sort of reentrancy is detected and prevented by skipping the new reentrant probe. You can reduce this phenomenon by placing probes out of interrupt handler paths, or by running with -DINTERRUPTIBLE=0 (but NMI can still interrupt things). You may request a more detailed trace record about each reentrancy event by specifying -DDEBUG_REENTRANCY=1.

Skipped due to uprobe register failure

A user-space probe registration (activation) attempt has failed. This could be because of having too many concurrent user-space probes, so that a fixed-sized table was exhausted (-DMAXUPROBES=mmm), or because the addresses were somehow invalid. Running with -DDEBUG_UPROBES=1 should generate some extra tracing into the kernel printk logs (see dmesg).

Skipped due to uprobe unregister failure

A user-space probe unregistration (deactivation) attempt has failed. This suggests some kind of internal error. -DDEBUG_UPROBES=1 may give some clues.

Skipped due to missed kretprobe/1 on 'FOO'

A kernel/module.function().return probe was requested, but the number of preallocated pending-kretprobe table slots was exhausted, because too many other instances of the given function have started but not yet returned. Add a .maxactive(nnnn) at the end of the kernel/module.function().return probe point specification to reserve more slots. Or if you're using a tapset alias or wildcard, you could increase the systemwide default with the -DKRETACTIVE=mmm parameter. You could set it to hundreds or thousands if you suspect that kernel threads can block in or under that function for a long time.

Skipped due to missed kretprobe/2 on 'FOO'

Similarly, a kernel .return probe was requested, but something went wrong with the corresponding function-entry kprobe. This should not happen, except perhaps in extreme low-kernel-memory conditions.

Skipped due to missed kprobe on 'FOO'

Similarly, a kernel function entry probe was requested, but something went wrong. This should not happen, except perhaps in extreme low-kernel-memory conditions.


None: TipSkippedProbes (last edited 2010-10-22 00:37:07 by FChE)