Differences between revisions 2 and 3
Revision 2 as of 2007-05-11 19:30:59
Size: 3198
Editor: FChE
Comment:
Revision 3 as of 2008-01-10 19:47:35
Size: 3202
Editor: localhost
Comment: converted to 1.6 markup
Deletions are marked like this. Additions are marked like this.
Line 11: Line 11:
attachment:thread-times.stp [[attachment:thread-times.stp]]

CPU sharing by threads

Problem

It is hard to tell how much time any given thread is spending in kernel vs. user space. Some statistics are available to user space, and some others only to the scheduler. But all in one place? Nope.

Scripts

thread-times.stp

function pc:long () %{ /* pure */
  if (CONTEXT->regs)
    THIS->__retvalue = (uint64_t) REG_IP (CONTEXT->regs);
%}

function kta:long (pc:long) %{ /* pure */ /* is given PC a kernel text address? */
#if defined (__ia64__)
  THIS->__retvalue = ((unsigned long)REG_IP(CONTEXT->regs) >= (unsigned long)KERNEL_START);
#else
  THIS->__retvalue = ((unsigned long)REG_IP(CONTEXT->regs) >= (unsigned long)PAGE_OFFSET);
#endif
%}

probe timer.profile {
   tid=tid()
   kernel_p = kta(pc())
   if (kernel_p)
      kticks[tid] <<< 1
   else
      uticks[tid] <<< 1
   ticks <<< 1
}

global uticks, kticks, ticks

global tids

probe timer.s(10), end {
   # gather all seen tids into a single array
   foreach (tid in kticks) tids[tid] += @count(kticks[tid])
   foreach (tid in uticks) tids[tid] += @count(uticks[tid])

   allticks = @count(ticks)
   printf ("%5s %7s %7s (of %d ticks)\n", "tid", "%user", "%kernel", allticks)
   foreach (tid in tids- limit 20) {
     uscaled = @count(uticks[tid])*10000/allticks
     kscaled = @count(kticks[tid])*10000/allticks
     printf ("%5d %3d.%02d%% %3d.%02d%%\n",
       tid, uscaled/100, uscaled%100, kscaled/100, kscaled%100)
   }
   printf("\n")

   delete uticks
   delete kticks
   delete ticks
   delete tids
}

Output

This script, run on a 2.66GHz four-core Xeon FC6 machine running folding@home and other stuff:

% stap -gt thread-times.stp
  tid   %user %kernel (of 39932 ticks)
16461  21.11%   3.03%
16470  18.57%   3.00%
16472  17.49%   3.18%
16463  17.15%   2.95%
    0   0.00%  13.20%
27882   0.00%   0.18%
20643   0.00%   0.03%
 3602   0.00%   0.01%
 3684   0.00%   0.00%
[...^C....]
probe timer.profile (thread-times.stp:17:1), hits: 604696, cycles: 1336min/5648avg/102600max
probe timer.s(10)+1 (thread-times.stp:31:1), hits: 16, cycles: 194840min/294773avg/330864max

Lessons

This script uses simple-minded sampling to credit kernel vs. user time to threads. The two embedded-C functions at the top really belong into the standard tapset, so the guru mode flag is a temporary requirement.

A few interesting systemtap programming techniques:

  • contention-free data acquisition (using only aggregate type globals in timer.profile, even including a scalar ticks)

  • report generation, based on sorted totals of aggregates
  • scaled pseudo-floating point
  • use of the -t (timing) flag, to estimate probing overhead caused by the script (5000 cycles ~= 2 us per profile timer hit)


WarStories

None: WSthreadCPUshare (last edited 2008-01-10 19:47:35 by localhost)