Top network users by PID

Problem

Someone asked if there's a way to tell how much network traffic each process is generating on a machine. Several suggestions were given, including a SystemTap script written by Jose Santos. I revised that script to use the networking tapset, track both transmits and receives, and print top-like output.

Reference this mailing list thread.

Scripts

global ifxmit, ifrecv, ifdevs, ifpid, execname, user

probe netdev.transmit
{
        p = pid()
        execname[p] = execname()
        user[p] = uid()
        ifdevs[p, dev_name] = dev_name
        ifxmit[p, dev_name] <<< length
        ifpid[p, dev_name] ++
}

probe netdev.receive
{
        p = pid()
        execname[p] = execname()
        user[p] = uid()
        ifdevs[p, dev_name] = dev_name
        ifrecv[p, dev_name] <<< length
        ifpid[p, dev_name] ++
}


function print_activity()
{
        printf("%5s %5s %-7s %7s %7s %7s %7s %-15s\n",
                "PID", "UID", "DEV", "XMIT_PK", "RECV_PK",
                "XMIT_KB", "RECV_KB", "COMMAND")

        foreach ([pid, dev] in ifpid-) {
                n_xmit = @count(ifxmit[pid, dev])
                n_recv = @count(ifrecv[pid, dev])
                printf("%5d %5d %-7s %7d %7d %7d %7d %-15s\n",
                        pid, user[pid], dev, n_xmit, n_recv,
                        n_xmit ? @sum(ifxmit[pid, dev])/1024 : 0,
                        n_recv ? @sum(ifrecv[pid, dev])/1024 : 0,
                        execname[pid])
        }

        print("\n")

        delete execname
        delete user
        delete ifdevs
        delete ifxmit
        delete ifrecv
        delete ifpid
}

probe timer.ms(5000)
{
        print_activity()
}

Output

The original script filtered out traffic for pid 0. During testing much of the traffic was missing from the output. I removed the pid 0 filter and found the missing traffic stats. It appears that the networking probes are triggered during interrupts, so the pid() function may not reflect the actual pid causing the traffic.

# stap nettop.stp
  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
    0     0 eth0         66     344      18      19 swapper
 2469     0 eth0        214      39     167       1 Xvnc
23470     0 eth0         24      35       5       1 firefox-bin
 2281     0 eth0          1       1       0       0 wcstatusd
22446     0 eth0          1       0       1       0 sshd
 2538     0 eth0          0       1       0       0 metacity
23557     0 eth0          0       1       0       0 sh
23559     0 eth0          0       1       0       0 lspci
23566     0 eth0          0       1       0       0 sh

  PID   UID DEV     XMIT_PK RECV_PK XMIT_KB RECV_KB COMMAND
    0     0 eth0         14      80       0       3 swapper
 2469     0 eth0         32       2      20       0 Xvnc
22446     0 eth0          1       0       0       0 sshd
 2052    38 eth0          1       0       0       0 ntpd

Lessons

Top-like scripts are very easy to write in SystemTap. This same general script structure can be applied to many data collection tasks. The hard part is finding the right kernel function to probe. It's important to understand the context in which functions/probes can be triggered.


WarStories

None: WSNetTop (last edited 2008-01-10 19:47:25 by localhost)