Hi, We have observed that sometimes the stap (and associated stapio) process could not be terminated using SIGTERM on one of our production systems: strace shows the following: + timeout 5s strace -p 8596 -s 1024 Process 8596 attached write(2, "Too many interrupts received, exiting.\n", 39Process 8596 detached <detached ...> + true + timeout 5s strace -p 9110 -s 1024 Process 9110 attached write(2, "WARNING:", 8Process 9110 detached <detached ...> + true The full backtrace of the stap and stapio process: + gstack 8596 #0 0x00007f8cc36bc420 in __write_nocancel () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000412308 in handle_interrupt () at main.cxx:280 #2 <signal handler called> #3 0x00007f8cc36bceca in __libc_waitpid (pid=9110, stat_loc=0x7ffe7df20abc, options=0) at ../sysdeps/unix/sysv/linux/waitpid.c:31 #4 0x0000000000597d8a in stap_waitpid (verbose=0, pid=9110) at util.cxx:749 #5 0x0000000000606cd9 in direct::finish (this=0x23fc830) at remote.cxx:108 #6 0x0000000000603222 in remote::run (remotes=std::vector of length 1, capacity 1 = {...}) at remote.cxx:1292 #7 0x00000000004126cc in pass_5 (s=..., targets=std::vector of length 1, capacity 1 = {...}) at main.cxx:1209 #8 0x000000000040fa54 in main (argc=<optimized out>, argv=<optimized out>) at main.cxx:1429 + gstack 9110 Thread 3 (Thread 0x7f2ad693d700 (LWP 9112)): #0 0x00007f2ad6d0e101 in do_sigwait (sig=0x7f2ad693cef4, set=<optimized out>) at ../sysdeps/unix/sysv/linux/sigwait.c:61 #1 __sigwait (set=set@entry=0x628170, sig=sig@entry=0x7f2ad693cef4) at ../sysdeps/unix/sysv/linux/sigwait.c:99 #2 0x0000000000402db5 in signal_thread (arg=0x628170) at mainloop.c:40 #3 0x00007f2ad6d06dc5 in start_thread (arg=0x7f2ad693d700) at pthread_create.c:308 #4 0x00007f2ad6a3573d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 2 (Thread 0x7f2ad613c700 (LWP 9113)): #0 0x00007f2ad6d0d43d in write () at ../sysdeps/unix/syscall-template.S:81 #1 0x0000000000407618 in reader_thread (data=0x0) at relay.c:235 #2 0x00007f2ad6d06dc5 in start_thread (arg=0x7f2ad613c700) at pthread_create.c:308 #3 0x00007f2ad6a3573d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 1 (Thread 0x7f2ad712b740 (LWP 9110)): #0 0x00007f2ad6a26c7d in write () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f2ad69b3f73 in _IO_new_file_write (f=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, data=0x7ffe0aaa9210, n=8) at fileops.c:1301 #2 0x00007f2ad69b470f in new_do_write (to_do=8, data=0x7ffe0aaa9210 "WARNING:acing...\n", fp=0x7f2ad6cf91c0 <_IO_2_1_stderr_>) at fileops.c:537 #3 _IO_new_file_xsputn (f=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, data=<optimized out>, n=8) at fileops.c:1383 #4 0x00007f2ad698a47d in buffered_vfprintf (s=s@entry=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, format=format@entry=0x40a648 "WARNING:", args=args@entry=0x7ffe0aaab850) at vfprintf.c:2340 #5 0x00007f2ad698531e in _IO_vfprintf_internal (s=s@entry=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, format=0x40a648 "WARNING:", ap=0x7ffe0aaab850) at vfprintf.c:1289 #6 0x00007f2ad6a4aefd in ___vfprintf_chk (fp=0x7f2ad6cf91c0 <_IO_2_1_stderr_>, flag=flag@entry=1, format=<optimized out>, ap=ap@entry=0x7ffe0aaab850) at vfprintf_chk.c:34 #7 0x0000000000405172 in vfprintf (__ap=0x7ffe0aaab850, __fmt=<optimized out>, __stream=<optimized out>) at /usr/include/bits/stdio2.h:127 #8 eprintf (fmt=<optimized out>) at common.c:693 #9 0x0000000000404c55 in stp_main_loop () at mainloop.c:810 #10 0x000000000040279d in main (argc=<optimized out>, argv=0x7ffe0aaadc68) at stapio.c:73 At this point, sending SIGTERM to neither of the process works, and the Kernel module remains loaded. Best, Datong
Any idea how to reproduce this? The two last thread backtraces seem to be sitting inside a write(2), which suggests a pipe-full or such condition on the stapio stdout files.
Already committed a fix to master. Feel free to close this PR.
commit ab368ac2a