Created attachment 11212 [details] stap-report output on the same box for the same installation I tried running the full official test suite of the current master (HEAD is commit 1d530d0409) and the test suite never completes due to the test file `systemtap.base/target_set_thread.exp` hanging forever. The `ps` command shows the following processes always hanging (already hanging there for more than 30min as of this writing): ``` root 72477 0.0 0.0 156252 9556 pts/2 Sl+ 19:27 0:00 expect -- /usr/share/dejagnu/runtest.exp --tool systemtap --tool_opts --srcdir /mnt/home/agentzh/git/systemtap.old/testsuite --outdir=artifacts/systemtap.base/target_set_thread systemtap.base/target_set_thread.exp --tool_opts install root 72426 0.0 0.0 120112 3168 pts/2 S+ 19:27 0:00 /bin/sh -c srcdir='.'; export srcdir; \ EXPECT=expect; export EXPECT; \ if /bin/sh -c "env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt/stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest --version" > /dev/null 2>&1; then \ exit_status=0; l='systemtap'; for tool in $l; do \ if env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest --tool $tool --tool_opts \'\' --srcdir /mnt/home/agentzh/git/systemtap.old/testsuite --outdir=artifacts/systemtap.base/target_set_thread systemtap.base/target_set_thread.exp --tool_opts \'install \'; \ then :; else exit_status=1; fi; \ done; \ else echo "WARNING: could not find 'env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt/stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest'" 1>&2; :;\ fi; \ exit $exit_status root 75570 0.0 0.0 163708 11124 pts/2 S+ 19:27 0:00 stap /mnt/home/agentzh/git/systemtap.old/testsuite/systemtap.base/target_set_thread.stp --runtime=dyninst -c stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1 root 75619 0.9 2.9 887204 355364 pts/2 Sl+ 19:27 0:14 /opt/stap/bin/stapdyn -c stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1 /tmp/stapUFtNuG/stap_73897a360cc4a5d2afa80ca6713de250_3356.so ``` All of these processes are stuck on the `S+` or `Sl+` states, i.e., all sleeping. The backtrace of the /opt/stap/bin/stapdyn process (pid 75619) obtained by gdb is as follows: ``` (gdb) bt (gdb) bt #0 0x00007ffb3732182d in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffb2c000c68) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffb2c000c18, cond=0x7ffb2c000c40) at pthread_cond_wait.c:502 #2 __pthread_cond_wait (cond=cond@entry=0x7ffb2c000c40, mutex=mutex@entry=0x7ffb2c000c18) at pthread_cond_wait.c:655 #3 0x00007ffb360f18d0 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x7ffb2c000c18) at /usr/include/boost/thread/pthread/condition_variable.hpp:178 #4 CondVar<Mutex<false> >::wait (this=0x7ffb2c000c18) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/common/src/dthread.h:121 #5 MailboxMT::dequeue (this=0x7ffb2c000b20, block=true) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/mailbox.C:156 #6 0x00007ffb36118bbe in int_process::waitAndHandleEvents (block=block@entry=false) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:1060 #7 0x00007ffb3614599d in Dyninst::ProcControlAPI::ProcessSet::stopProcs (this=0x7ffb2c001630) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/procset.C:1499 #8 0x00007ffb360fa2ae in Dyninst::ProcControlAPI::Process::stopProc (this=<optimized out>) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:6488 #9 0x00007ffb378d5f68 in BPatch_process::finalizeInsertionSet (this=0x20517c0) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/dyninstAPI/src/BPatch_process.C:814 #10 0x000000000040e76a in mutatee::remove_instrumentation (this=this@entry=0x289cbd0) at mutatee.cxx:680 #11 0x000000000040e7cd in mutatee::~mutatee (this=0x289cbd0, __in_chrg=<optimized out>) at mutatee.cxx:141 #12 0x000000000040d919 in __gnu_cxx::new_allocator<mutatee>::destroy<mutatee> (__p=<optimized out>, this=<optimized out>) at /usr/include/c++/7/ext/new_allocator.h:140 #13 std::allocator_traits<std::allocator<mutatee> >::destroy<mutatee> (__p=<optimized out>, __a=...) at /usr/include/c++/7/bits/alloc_traits.h:487 #14 std::_Sp_counted_ptr_inplace<mutatee, std::allocator<mutatee>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:535 #15 0x000000000040d676 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x289cbc0) at /usr/include/c++/7/bits/shared_ptr_base.h:154 #16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684 #17 std::__shared_ptr<mutatee, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123 #18 std::shared_ptr<mutatee>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93 #19 std::_Destroy<std::shared_ptr<mutatee> > (__pointer=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:98 #20 std::_Destroy_aux<false>::__destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:108 #21 std::_Destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:137 #22 std::_Destroy<std::shared_ptr<mutatee>*, std::shared_ptr<mutatee> > (__last=0x7a60140, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:206 #23 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::_M_erase_at_end (__pos=0x7a60130, this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1513 #24 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::clear (this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1248 #25 mutator::run (this=0x2020a40) at mutator.cxx:642 #26 0x0000000000408328 in main (argc=<optimized out>, argv=<optimized out>) at stapdyn.cxx:161 ``` The full backtrace from the gdb command `bt full` is as follows: ``` (gdb) bt full #0 0x00007ffb3732182d in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffb2c000c68) at ../sysdeps/unix/sysv/linux/futex-internal.h:88 __ret = -512 oldtype = 0 err = <optimized out> #1 __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffb2c000c18, cond=0x7ffb2c000c40) at pthread_cond_wait.c:502 spin = 0 buffer = {__routine = 0x7ffb373215a0 <__condvar_cleanup_waiting>, __arg = 0x7fff2f207870, __canceltype = 910300128, __prev = 0x0} cbuffer = {wseq = 28, cond = 0x7ffb2c000c40, mutex = 0x7ffb2c000c18, private = 0} err = <optimized out> g = 0 flags = <optimized out> signals = <optimized out> result = 0 seq = 14 #2 __pthread_cond_wait (cond=cond@entry=0x7ffb2c000c40, mutex=mutex@entry=0x7ffb2c000c18) at pthread_cond_wait.c:655 No locals. #3 0x00007ffb360f18d0 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x7ffb2c000c18) at /usr/include/boost/thread/pthread/condition_variable.hpp:178 guard = {m = 0x7ffb2c0013b0} check_for_interruption = {thread_info = 0x0, m = 0x7ffb2c000c18, set = false} res = 0 #4 CondVar<Mutex<false> >::wait (this=0x7ffb2c000c18) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/common/src/dthread.h:121 No locals. #5 MailboxMT::dequeue (this=0x7ffb2c000b20, block=true) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/mailbox.C:156 q = <optimized out> ret = <optimized out> #6 0x00007ffb36118bbe in int_process::waitAndHandleEvents (block=block@entry=false) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:1060 hasAsyncPending = <optimized out> hasRunningThread = -8 hasNeonatalThreads = <optimized out> terminating = <optimized out> exitEvent = <optimized out> hasAsyncEvents = <optimized out> hasHandlerThread = <optimized out> should_block = true ev = {px = 0x7ffb1c002090, pn = {pi_ = 0xe09df70}} hpool = <optimized out> hasClearingBP = <optimized out> hasSyncRPCRunningThrd = <optimized out> hasBlock = <optimized out> hasStartupTeardownProc = <optimized out> hasStopPending = <optimized out> hasProcStopRPC = <optimized out> hasGotEvent = <optimized out> proc = {px = 0x29f71f0, pn = {pi_ = 0x27abdb0}} should_handle_ev = <optimized out> gotEvent = <optimized out> __PRETTY_FUNCTION__ = "static bool int_process::waitAndHandleEvents(bool)" error = false recurse = true #7 0x00007ffb3614599d in Dyninst::ProcControlAPI::ProcessSet::stopProcs (this=0x7ffb2c001630) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/procset.C:1499 lock_this_func = {should_unlock = true} had_error = false had_success = true error_set = std::set with 0 elements iter = {msg = 0x7ffb361b56ff "stopProc", had_error = @0x7fff2f207abf, flags = 19, container = 0x10d45070, iter = {px = 0x1, pn = {pi_ = 0x40}}, finished_clear = false, did_begin = true} result = <optimized out> #8 0x00007ffb360fa2ae in Dyninst::ProcControlAPI::Process::stopProc (this=<optimized out>) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:6488 ps = {px = 0x7ffb2c001630, pn = {pi_ = 0x7ffb1c001cc0}} #9 0x00007ffb378d5f68 in BPatch_process::finalizeInsertionSet (this=0x20517c0) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/dyninstAPI/src/BPatch_process.C:814 shouldContinue = false ret = <optimized out> this = 0x20517c0 shouldContinue = false #10 0x000000000040e76a in mutatee::remove_instrumentation (this=this@entry=0x289cbd0) at mutatee.cxx:680 No locals. #11 0x000000000040e7cd in mutatee::~mutatee (this=0x289cbd0, __in_chrg=<optimized out>) at mutatee.cxx:141 No locals. #12 0x000000000040d919 in __gnu_cxx::new_allocator<mutatee>::destroy<mutatee> (__p=<optimized out>, this=<optimized out>) at /usr/include/c++/7/ext/new_allocator.h:140 No locals. #13 std::allocator_traits<std::allocator<mutatee> >::destroy<mutatee> (__p=<optimized out>, __a=...) at /usr/include/c++/7/bits/alloc_traits.h:487 No locals. #14 std::_Sp_counted_ptr_inplace<mutatee, std::allocator<mutatee>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:535 No locals. #15 0x000000000040d676 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x289cbc0) at /usr/include/c++/7/bits/shared_ptr_base.h:154 No locals. #16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684 No locals. ---Type <return> to continue, or q <return> to quit--- #17 std::__shared_ptr<mutatee, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123 No locals. #18 std::shared_ptr<mutatee>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93 No locals. #19 std::_Destroy<std::shared_ptr<mutatee> > (__pointer=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:98 No locals. #20 std::_Destroy_aux<false>::__destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:108 No locals. #21 std::_Destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:137 No locals. #22 std::_Destroy<std::shared_ptr<mutatee>*, std::shared_ptr<mutatee> > (__last=0x7a60140, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:206 No locals. #23 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::_M_erase_at_end (__pos=0x7a60130, this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1513 No locals. #24 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::clear (this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1248 No locals. #25 mutator::run (this=0x2020a40) at mutator.cxx:642 No locals. #26 0x0000000000408328 in main (argc=<optimized out>, argv=<optimized out>) at stapdyn.cxx:161 session = std::unique_ptr<mutator> containing 0x2020a40 pid = 0 command = 0x7fff2f209296 "stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1" module = 0x7fff2f2092cd "/tmp/stapUFtNuG/stap_73897a360cc4a5d2afa80ca6713de250_3356.so" opt = <optimized out> modoptions = std::vector of length 0, capacity 0 ``` More info: ``` $ gcc --version gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6) Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. $ uname -a Linux work2 4.16.16-200.fc27.x86_64 #1 SMP Sun Jun 17 03:06:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux $ free total used free shared buff/cache available Mem: 11981784 689308 6512768 64964 4779708 10166620 Swap: 2097148 0 2097148 $ cat /etc/redhat-release Fedora release 27 (Twenty Seven) ```
Oh, BTW, I used the following command to run the test suite: ``` sudo make -j5 installcheck-parallel ```
Running the offending test file `systemtap.base/target_set_thread.exp` individually does not seem to reproduce the hang. Weird.
commit 0aee37568280 may work around this probable dyninst bug