5.2.2. Tracking I/O Time For Each File Read or Write

This section describes how to monitor the amount of time it takes for each process to read from or write to any file. This is useful to determine what files are slow to load on a given system.
#! /usr/bin/env stap

 * Copyright (C) 2006-2018 Red Hat Inc.
 * This copyrighted material is made available to anyone wishing to use,
 * modify, copy, or redistribute it subject to the terms and conditions
 * of the GNU General Public License v.2.
 * You should have received a copy of the GNU General Public License
 * along with this program.  If not, see <http://www.gnu.org/licenses/>.
 * Print out the amount of time spent in the read and write systemcall
 * when each file opened by the process is closed. Note that the systemtap 
 * script needs to be running before the open operations occur for
 * the script to record data.
 * This script could be used to to find out which files are slow to load
 * on a machine. e.g.
 * stap iotime.stp -c 'firefox'
 * Output format is:
 * timestamp pid (executabable) info_type path ...
 * 200283135 2573 (cupsd) access /etc/printcap read: 0 write: 7063
 * 200283143 2573 (cupsd) iotime /etc/printcap time: 69

global start
global time_io

function timestamp:long() { return gettimeofday_us() - start }

function proc:string() { return sprintf("%d (%s)", pid(), execname()) }

probe begin { start = gettimeofday_us() }

global possible_filename, filehandles, fileread, filewrite

probe syscall.open, syscall.openat {
  possible_filename[tid()] = filename

probe syscall.open.return, syscall.openat.return {
  // Get filename even if non-dwarf syscall return probe are used.
  filename = possible_filename[tid()]
  delete  possible_filename[tid()]
  if (retval != -1) {
    filehandles[pid(), retval] = filename
  } else {
    printf("%d %s access %s fail\n", timestamp(), proc(), filename)

global read_fds, write_fds

probe syscall.read { read_fds[tid()] = fd }

probe syscall.read.return {
  p = pid()
  // Get fd even if non-dwarf syscall return probe.
  fd = read_fds[tid()]
  delete read_fds[tid()]
  bytes = retval
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    fileread[p, fd] <<< bytes
  time_io[p, fd] <<< time

probe syscall.write { write_fds[tid()] = fd }

probe syscall.write.return {
  p = pid()
  // Get fd even if non-dwarf syscall return probe.
  fd = write_fds[tid()]
  delete write_fds[tid()]
  bytes = retval
  time = gettimeofday_us() - @entry(gettimeofday_us())
  if (bytes > 0)
    filewrite[p, fd] <<< bytes
  time_io[p, fd] <<< time

probe syscall.close {
  if ([pid(), fd] in filehandles) {
    printf("%d %s access %s read: %d write: %d\n",
           timestamp(), proc(), filehandles[pid(), fd],
           @sum(fileread[pid(), fd]), @sum(filewrite[pid(), fd]))
    if (@count(time_io[pid(), fd]))
      printf("%d %s iotime %s time: %d\n",  timestamp(), proc(),
             filehandles[pid(), fd], @sum(time_io[pid(), fd]))
  delete fileread[pid(), fd]
  delete filewrite[pid(), fd]
  delete filehandles[pid(), fd]
  delete time_io[pid(),fd]
iotime.stp tracks each time a system call opens, closes, reads from, and writes to a file. For each file any system call accesses, iotime.stp counts the number of microseconds it takes for any reads or writes to finish and tracks the amount of data (in bytes) read from or written to the file.
iotime.stp also uses the local variable $count to track the amount of data (in bytes) that any system call attempts to read or write. Note that $return (as used in disktop.stp from Section 5.2.1, “Summarizing Disk Read/Write Traffic”) stores the actual amount of data read/written. $count can only be used on probes that track data reads or writes (that is, syscall.read and syscall.write).

Example 5.7. iotime.stp Sample Output

825946 3364 (NetworkManager) access /sys/class/net/eth0/carrier read: 8190 write: 0
825955 3364 (NetworkManager) iotime /sys/class/net/eth0/carrier time: 9
117061 2460 (pcscd) access /dev/bus/usb/003/001 read: 43 write: 0
117065 2460 (pcscd) iotime /dev/bus/usb/003/001 time: 7
3973737 2886 (sendmail) access /proc/loadavg read: 4096 write: 0
3973744 2886 (sendmail) iotime /proc/loadavg time: 11
Example 5.7, “iotime.stp Sample Output” prints out the following data:
If a process was able to read or write any data, a pair of access and iotime lines should appear together. The access line's timestamp refers to the time that a given process started accessing a file; at the end of the line, it will show the amount of data read/written (in bytes). The iotime line will show the amount of time (in microseconds) that the process took in order to perform the read or write.
If an access line is not followed by an iotime line, it means that the process did not read or write any data.