Sleeping Beauties

Problem

Some threads, including kernel threads, sometimes go to sleep (in "D" state) for too long. Wouldn't it be nice to find out what they are doing?

Scripts

This script intercepts the entry and exit to the wait_for_completion scheduler function, recording a timestamp. If the thread sleeps for "too long" (as defined by a configurable number of time units, like 10 milliseconds), then a separate timer-based probe will describe the thread as "impatient", and print a stack traceback at the start of wait. If this happens, the return probe will also print a line to report the complete elapsed "bored" (blocked) time.

sleepingBeauties.stp

function time () { return gettimeofday_ms() }
global time_name = "ms"
global boredom = 10 # in time units


probe kernel.function("wait_for_completion").call
{ 
  t=tid()
  back[t]=backtrace()
  name[t]=execname()
  backtime[t]=time()
  delete bored[t]
}

probe kernel.function("wait_for_completion").return
{
  t=tid()

  if ([t] in bored) {
    patience = time() - backtime[t]
    printf ("thread %d (%s) bored for %d %s\n", 
            t, name[t], patience, time_name)
  }

  delete bored[t]
  delete back[t]
  delete name[t]
  delete backtime[t]
}


probe timer.profile {
  foreach (tid+ in back) {
    if ([tid] in bored) continue

    patience = time() - backtime[tid]
    if (patience >= boredom) {
      printf ("thread %d (%s) impatient after %d %s\n",
              tid, name[tid], patience, time_name)
      print_stack (back[tid])
      printf ("\n")
      bored[tid] = 1 # defer further reports to wakeup
    }
  }
}

global name, back, backtime, bored

Output

The 10 millisecond default boredom interval will trigger relatively frequently on a machine with ordinary disk drives. If this number is raised to hundreds of milliseconds, one will see reports of only the particularly long-duration blockages.

# stap sleepingBeauties.stp
thread 2086 (make) impatient after 19 ms
 0xc05edec6 : wait_for_completion+0x1/0x98 []
 0x00000000
 0x00000bac
 0xf0aace87 : probe_1524+0x1e8/0x56c [stap_e37f8d9fbd67c1df1215efde25cd0604_2362
]
 0xf0aac634 : enter_kprobe_probe+0xfc/0x217 [stap_e37f8d9fbd67c1df1215efde25cd06
04_2362]
 0xc05f2963 : aggr_pre_handler+0x35/0x75 []
 0xc05f1193 : kprobe_exceptions_notify+0x187/0x3db []
 0xc05f1d70 : notifier_call_chain+0x2b/0x4a []
 0xc05f1dab : __atomic_notifier_call_chain+0x1c/0x22 []
 0xc05f1dcb : atomic_notifier_call_chain+0x1a/0x1c []
 0x0c04380f

thread 2086 (make) bored for 19 ms
thread 2106 (gcc) impatient after 20 ms
 0xc05edec6 : wait_for_completion+0x1/0x98 []
 0x00000000
 0x00000bac
 0xf0aace87 : probe_1524+0x1e8/0x56c [stap_e37f8d9fbd67c1df1215efde25cd0604_2362
]
 0xf0aac634 : enter_kprobe_probe+0xfc/0x217 [stap_e37f8d9fbd67c1df1215efde25cd06
04_2362]
 0xc05f2963 : aggr_pre_handler+0x35/0x75 []
 0xc05f1193 : kprobe_exceptions_notify+0x187/0x3db []
 0xc05f1d70 : notifier_call_chain+0x2b/0x4a []
 0xc05f1dab : __atomic_notifier_call_chain+0x1c/0x22 []
 0xc05f1dcb : atomic_notifier_call_chain+0x1a/0x1c []
 0x0c04380f

thread 2106 (gcc) bored for 43 ms
thread 2238 (collect2) impatient after 12 ms
 0xc05edec6 : wait_for_completion+0x1/0x98 []
 0x00000000
 0x00000bac
 0xf0aace87 : probe_1524+0x1e8/0x56c [stap_e37f8d9fbd67c1df1215efde25cd0604_2362
]
 0xf0aac634 : enter_kprobe_probe+0xfc/0x217 [stap_e37f8d9fbd67c1df1215efde25cd06
04_2362]
 0xc05f2963 : aggr_pre_handler+0x35/0x75 []
 0xc05f1193 : kprobe_exceptions_notify+0x187/0x3db []
 0xc05f1d70 : notifier_call_chain+0x2b/0x4a []
 0xc05f1dab : __atomic_notifier_call_chain+0x1c/0x22 []
 0xc05f1dcb : atomic_notifier_call_chain+0x1a/0x1c []
 0x0c04380f

thread 2249 (collect2) impatient after 13 ms
 0xc05edec6 : wait_for_completion+0x1/0x98 []
 0x00000000
 0x00000bac
 0xf0aace87 : probe_1524+0x1e8/0x56c [stap_e37f8d9fbd67c1df1215efde25cd0604_2362
]
 0xf0aac634 : enter_kprobe_probe+0xfc/0x217 [stap_e37f8d9fbd67c1df1215efde25cd06
04_2362]
 0xc05f2963 : aggr_pre_handler+0x35/0x75 []
 0xc05f1193 : kprobe_exceptions_notify+0x187/0x3db []
 0xc05f1d70 : notifier_call_chain+0x2b/0x4a []
 0xc05f1dab : __atomic_notifier_call_chain+0x1c/0x22 []
 0xc05f1dcb : atomic_notifier_call_chain+0x1a/0x1c []
 0x0c04380f

thread 2238 (collect2) bored for 121 ms
thread 2249 (collect2) bored for 28 ms

Lessons

Systemtap's stack backtrace function is constantly improving ... and exproving too. So the sample output above should have had a far more informative message than here. http://sources.redhat.com/PR4633

The script could omit the timer.profile probe and just print overlong wait reports within the .return probe itself. A problem with this is that if an indefinitely long blockage occurs, one might never see the report.

The script could intercept thread block events in other places - or hook the other wait_for_completion_... variants. This could be useful for NFS-related problems.

It should be possible to have the system automatically identify excessive delays in terms of standard deviations or such other computed measures. We'll leave that to another "war story".

A begin probe really should report the boredom and timekeeping parameters.


WarStories

None: WSsleepingBeauties (last edited 2008-01-10 19:47:35 by localhost)