This is the mail archive of the mailing list for the systemtap project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Latencytap thoughts

Hi everyone,

I working to close out PR6960, latencytap. I thought that I would
discuss the issues with the current latencytap on the mailing list and
see what suggestions other people have to improve it.

The current output of latencytap is rather nebulous. The output should
point out specific things that can be improved or corrected on the
system. Below is an example of current output from latencytap.stp that
it prints out for every 30 second interval:

Reason                                  Count  Average(us)  Maximum(us) Percent%
Application requested delay               340       442067      2000977      20%
Waiting for event (poll)                  348       375896     29999669      17%
Waiting for event (select)                 47      1970250      4999867      12%
                                         3034        25937     18434087      10%
Waiting for event (select)                 31      1935439     30000899       8%
Waking ksoftirqd                            2     15268765     29925887       4%
Userspace lock contention                  30      1000943      1000952       4%
Waiting for event (select)                  6      4999926      4999932       4%
                                           22      1363599      3423967       4%
pdflush() kernel thread                     6      4999761      4999962       4%
Waiting for event (poll)                  150       199950       201709       4%
Waiting for event (epoll)                   2     14544543     26414890       3%
kjournald() kernel thread                   2     11711964     18423438       3%
Waiting for event (epoll)                   2       999716       999886       0%
EXT3: Waiting for journal access            3        41252       107177       0%
opening cdrom device                       15         2529         2685       0%
opening cdrom device                       15         2116         2173       0%
block device IOCTL                         15         2109         2161       0%
opening cdrom device                       15         2081         2197       0%
opening cdrom device                       15         1964         2112       0%

The "Reason" column is the based on function found in the stack
backtrace. If there is no reason found for any of the functions in the
backtrace, then the reason is left blank. One can generate a kernel
module use the debug=1 with staprun to get original backtraces for
ones without reasons. Another side effect of this method is that there
can be multiple entries with the same reason beacuse they have
different backtraces.

The rows are sorted by on the total amount of the time spent deactivated
for each backtrace.  This can be seen by the "Percent%" column on the
right. Note that multiple backtraces have the same reason are not
condensed into a single entry right now.

The question is what kind of data analysis would help people figure
out "What the hold up is on the machine?"

Maybe divide things into interruptible and noninterruptible reasons.

Have a sub-table showing which user processes have the greatest amount of latency.

Any other suggestions would appreciated.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]