Bug 15144 - occasional (40%) stapdyn spin/hang during sdt.exp
Summary: occasional (40%) stapdyn spin/hang during sdt.exp
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: dyninst (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-02-13 15:23 UTC by Frank Ch. Eigler
Modified: 2017-05-16 16:00 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Frank Ch. Eigler 2013-02-13 15:23:55 UTC
With git systemtap, rawhide 3.8.0-rc7 kernel, dyninst 8.0,

# make installcheck RUNTESTFLAGS=sdt.exp

About half the time, we get a stapdyn hang/loop in the test:

executing: stap --runtime=dyninst -w /home/fche/Private/DEVEL/DEVEL-systemtap/git/systemtap3/testsuite/systemtap.base/sdt.stp sdt.c.exe.2 -c ./sdt.c.exe.2

... top shows one of the stapdyn threads spinning with 100% cpu

(gdb) bt
#0  0x00000031e18da4c7 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
#1  0x00002aaaabc10ff5 in GeneratorLinux::evictFromWaitpid() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#2  0x00002aaaabc11079 in GeneratorLinux::~GeneratorLinux() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#3  0x00002aaaabc110a9 in GeneratorLinux::~GeneratorLinux() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#4  0x00000031e183944a in __cxa_finalize (d=0x2aaaabf04da0) at cxa_finalize.c:55
#5  0x00002aaaabc0f513 in __do_global_dtors_aux () from /usr/lib64/dyninst/libpcontrol.so.8.0
#6  0x00007fffeb5c2ab0 in ?? ()
#7  0x00000031e140f9da in _dl_fini () at dl-fini.c:253
Backtrace stopped: frame did not save the PC

(gdb) info thread
  Id   Target Id         Frame 
  2    Thread 0x2aaaad232700 (LWP 2167) "stapdyn" (running)
* 1    Thread 0x2aaaacc20a00 (LWP 2165) "stapdyn" (running)

(gdb) quit
... detaching sometimes kills the errant stapdyn and lets the test resume.
Comment 1 Frank Ch. Eigler 2013-02-13 17:53:42 UTC
global_var-m64-O2-dyninst (global_var.exp) has also triggered it.
Comment 2 Frank Ch. Eigler 2013-02-14 18:37:51 UTC
Additional backtrace info from the affected process:

(gdb) bt
#0  0x00000031e18da4c7 in sched_yield () at ../sysdeps/unix/syscall-template.S:81
#1  0x00002aaaabc10ff5 in GeneratorLinux::evictFromWaitpid() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#2  0x00002aaaabc11079 in GeneratorLinux::~GeneratorLinux() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#3  0x00002aaaabc110a9 in GeneratorLinux::~GeneratorLinux() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#4  0x00000031e183944a in __cxa_finalize (d=0x2aaaabf04da0) at cxa_finalize.c:55
#5  0x00002aaaabc0f513 in __do_global_dtors_aux () from /usr/lib64/dyninst/libpcontrol.so.8.0
#6  0x00007fff2a0c2cc0 in ?? ()
#7  0x00000031e140f9da in _dl_fini () at dl-fini.c:253
Backtrace stopped: frame did not save the PC
(gdb) info thread
  Id   Target Id         Frame 
  2    Thread 0x2aaaad24c700 (LWP 4634) "stapdyn" pthread_cond_wait@@GLIBC_2.3.2 ()
    at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
* 1    Thread 0x2aaaacc20a00 (LWP 4632) "stapdyn" 0x00000031e18da4c7 in sched_yield ()
    at ../sysdeps/unix/syscall-template.S:81
(gdb) thread 2
[Switching to thread 2 (Thread 0x2aaaad24c700 (LWP 4634))]
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
185	62:	movl	(%rsp), %edi
(gdb) bt
#0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185
#1  0x00002aaaabc248cd in CondVar::wait() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#2  0x00002aaaabc0eb48 in LinuxPtrace::main() () from /usr/lib64/dyninst/libpcontrol.so.8.0
#3  0x00002aaaabc2453c in thread_init(void*) () from /usr/lib64/dyninst/libpcontrol.so.8.0
#4  0x00000031e2407c63 in start_thread (arg=0x2aaaad24c700) at pthread_create.c:308
#5  0x00000031e18f524d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Comment 3 Josh Stone 2015-05-22 22:55:32 UTC
I don't think we've seen this exact hang in a while -- perhaps fixed by this?

http://git.dyninst.org/?p=dyninst.git;a=commit;h=8e556b1d0ebd9ab1c5c47cdcfabafeb683ba90d8
"Make sure SIGUSR2 is cleared from masked signals before using it in PC."

(SIGUSR2 is how evictFromWaitpid() works.)
Comment 4 Stan Cox 2017-05-16 16:00:37 UTC
Seems consistenly okay  now with:
dyninst-9.3.1-1.fc25.x86_64