Futex contention

Problem

There appear to be no simple tools for answering the common question: which of a program's mutexes are contended.

Since systemtap does not support probing directly on userspace yet, we cannot hook into glibc or higher level code and take measurements. However, since modern glibc uses the futex facility only for contended locks, it's a close approximation.

So, the task becomes assessing which locks in a multithreaded program tend to take the longest between the beginning and completion of a futex wait. This we can do.

Scripts

futexes.stp

You'll need systemtap 0.5.10+ for this script, due to use of the constant initialization on line 8. Thinking back to it now, the script will work on an older version if you remove the "= 0" and "= 1" parts. (That's because FUTEX_WAIT is initialized to zero automatically anyway, and FUTEX_WAKE is never used.)

# This script tries to identify contended user-space locks by hooking
# into the futex system call.

global thread_thislock # short
global thread_blocktime #
global FUTEX_WAIT = 0, FUTEX_WAKE = 1

global lock_waits # long-lived stats on (tid,lock) blockage elapsed time
global process_names # long-lived pid-to-execname mapping

probe syscall.futex {
   if (op != FUTEX_WAIT) next  # we don't care about originators of WAKE events
   t = tid ()
   process_names[pid()] = execname()
   thread_thislock[t] = $uaddr
   thread_blocktime[t] = gettimeofday_us()
}

probe syscall.futex.return {
   t = tid()
   ts = thread_blocktime[t]
   if (ts) {
     elapsed = gettimeofday_us() - ts
     lock_waits[pid(), thread_thislock[t]] <<< elapsed
     delete thread_blocktime[t]
     delete thread_thislock[t]
   }
}

probe end {
   foreach ([pid+, lock] in lock_waits)
     printf ("%s[%d] lock %p contended %d times, %d avg us\n",
       process_names[pid], pid, lock, @count(lock_waits[pid,lock]), @avg(lock_waits[pid,lock]))
}

Output

Here is a sample run on a desktop machine busily playing Gary Numan music.

# stap futexes.stp
^C
liferea-bin[3613] lock 0x0a117340 contended 1 times, 71 avg us
java_vm[8155] lock 0x09baa45c contended 9 times, 1004013 avg us
java_vm[8155] lock 0x09d6f9ac contended 186 times, 55964 avg us
xmms[16738] lock 0xbfe29eec contended 777 times, 69 avg us
xmms[16738] lock 0xbfe29ecc contended 119 times, 64 avg us
xmms[16738] lock 0xbfe29ebc contended 111 times, 68 avg us
xmms[16738] lock 0xbfe29ef0 contended 742 times, 91 avg us
xmms[16738] lock 0xbfe29ed0 contended 107 times, 101 avg us
xmms[16738] lock 0xbfe29ec0 contended 107 times, 74 avg us
firefox-bin[21801] lock 0x096d16f4 contended 24 times, 12 avg us
firefox-bin[21801] lock 0x096d1198 contended 2 times, 4 avg us
firefox-bin[21801] lock 0x096d16f8 contended 150 times, 64997 avg us
named[23869] lock 0x41ab0b84 contended 1 times, 131 avg us
named[23869] lock 0x41ab0b50 contended 1 times, 26 avg us

Lessons

Once one changes the problem from userspace to ordinary kernel probing, it's not hard to get systemtap to produce results. The results are informative, clearly identifying problematic locks, at least by process and address.

Mapping the lock addresses back to application-level pthread mutexes is unfortunately something left to the user. This is because these locks may be dynamically allocated, or may lack any ELF-level symbolic information to identify from outside the process. However, this is surmountable if the application includes some sort of lock enumeration code. Such a lock list could be generated at any convenient time, and later matched against results from an online systemtap run. (Remember, it was not necessary to interrupt the system's normal behavior to take these measurements.)


WarStories

None: WSFutexContention (last edited 2008-01-10 19:47:28 by localhost)