CPU sharing by threads
Problem
It is hard to tell how much time any given thread is spending in kernel vs. user space. Some statistics are available to user space, and some others only to the scheduler. But all in one place? Nope.
Scripts
function pc:long () %{ /* pure */ if (CONTEXT->regs) THIS->__retvalue = (uint64_t) REG_IP (CONTEXT->regs); %} function kta:long (pc:long) %{ /* pure */ /* is given PC a kernel text address? */ #if defined (__ia64__) THIS->__retvalue = ((unsigned long)REG_IP(CONTEXT->regs) >= (unsigned long)KERNEL_START); #else THIS->__retvalue = ((unsigned long)REG_IP(CONTEXT->regs) >= (unsigned long)PAGE_OFFSET); #endif %} probe timer.profile { tid=tid() kernel_p = kta(pc()) if (kernel_p) kticks[tid] <<< 1 else uticks[tid] <<< 1 ticks <<< 1 } global uticks, kticks, ticks global tids probe timer.s(10), end { # gather all seen tids into a single array foreach (tid in kticks) tids[tid] += @count(kticks[tid]) foreach (tid in uticks) tids[tid] += @count(uticks[tid]) allticks = @count(ticks) printf ("%5s %7s %7s (of %d ticks)\n", "tid", "%user", "%kernel", allticks) foreach (tid in tids- limit 20) { uscaled = @count(uticks[tid])*10000/allticks kscaled = @count(kticks[tid])*10000/allticks printf ("%5d %3d.%02d%% %3d.%02d%%\n", tid, uscaled/100, uscaled%100, kscaled/100, kscaled%100) } printf("\n") delete uticks delete kticks delete ticks delete tids }
Output
This script, run on a 2.66GHz four-core Xeon FC6 machine running folding@home and other stuff:
% stap -gt thread-times.stp tid %user %kernel (of 39932 ticks) 16461 21.11% 3.03% 16470 18.57% 3.00% 16472 17.49% 3.18% 16463 17.15% 2.95% 0 0.00% 13.20% 27882 0.00% 0.18% 20643 0.00% 0.03% 3602 0.00% 0.01% 3684 0.00% 0.00% [...^C....] probe timer.profile (thread-times.stp:17:1), hits: 604696, cycles: 1336min/5648avg/102600max probe timer.s(10)+1 (thread-times.stp:31:1), hits: 16, cycles: 194840min/294773avg/330864max
Lessons
This script uses simple-minded sampling to credit kernel vs. user time to threads. The two embedded-C functions at the top really belong into the standard tapset, so the guru mode flag is a temporary requirement.
A few interesting systemtap programming techniques:
contention-free data acquisition (using only aggregate type globals in timer.profile, even including a scalar ticks)
- report generation, based on sorted totals of aggregates
- scaled pseudo-floating point
use of the -t (timing) flag, to estimate probing overhead caused by the script (5000 cycles ~= 2 us per profile timer hit)