Bug 23572

Summary: Test file systemtap.base/target_set_thread.exp hangs forever on kernel 4.16.16 (Fedora 27)
Product: systemtap Reporter: agentzh <agentzh>
Component: testsuiteAssignee: Unassigned <systemtap>
Status: UNCONFIRMED ---    
Severity: normal CC: fche
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: stap-report output on the same box for the same installation

Description agentzh 2018-08-26 03:10:27 UTC
Created attachment 11212 [details]
stap-report output on the same box for the same installation

I tried running the full official test suite of the current master (HEAD is commit 1d530d0409) and the test suite never completes due to the test file `systemtap.base/target_set_thread.exp` hanging forever.

The `ps` command shows the following processes always hanging (already hanging there for more than 30min as of this writing):

```
root      72477  0.0  0.0 156252  9556 pts/2    Sl+  19:27   0:00 expect -- /usr/share/dejagnu/runtest.exp --tool systemtap --tool_opts  --srcdir /mnt/home/agentzh/git/systemtap.old/testsuite --outdir=artifacts/systemtap.base/target_set_thread systemtap.base/target_set_thread.exp --tool_opts install
root      72426  0.0  0.0 120112  3168 pts/2    S+   19:27   0:00 /bin/sh -c srcdir='.'; export srcdir; \ EXPECT=expect; export EXPECT; \ if /bin/sh -c "env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt/stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest --version" > /dev/null 2>&1; then \   exit_status=0; l='systemtap'; for tool in $l; do \     if env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest  --tool $tool --tool_opts \'\' --srcdir /mnt/home/agentzh/git/systemtap.old/testsuite --outdir=artifacts/systemtap.base/target_set_thread systemtap.base/target_set_thread.exp  --tool_opts \'install \'; \     then :; else exit_status=1; fi; \   done; \ else echo "WARNING: could not find 'env XDG_DATA_DIRS= SYSTEMTAP_SYNC=1 LANG=C SYSTEMTAP_TESTREMOTES= SYSTEMTAP_TESTAPPS= SYSTEMTAP_RUNTIME=/opt/stap/share/systemtap/runtime SYSTEMTAP_TAPSET=/opt/stap/share/systemtap/tapset LD_LIBRARY_PATH=/opt/stap/lib/systemtap:/opt/stap/lib/dyninst CRASH_LIBDIR=/opt/stap/lib/systemtap PATH=/opt/stap/bin:/opt/stap/sbin:$PATH SYSTEMTAP_PATH=/opt/stap/bin SYSTEMTAP_SPATH=/opt/stap/sbin SYSTEMTAP_INCLUDES=/opt/stap/include PKGLIBDIR=/opt/stap/libexec/systemtap PYTHON=/bin/python2 PYEXECDIR=/opt/stap/lib64/python2.7/site-packages PYTHON3=/bin/python3 PY3EXECDIR=/opt/stap/lib64/python3.6/site-packages SYSCONFDIR=/opt/stap/etc ./execrc runtest'" 1>&2; :;\ fi; \ exit $exit_status
root      75570  0.0  0.0 163708 11124 pts/2    S+   19:27   0:00 stap /mnt/home/agentzh/git/systemtap.old/testsuite/systemtap.base/target_set_thread.stp --runtime=dyninst -c stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1
root      75619  0.9  2.9 887204 355364 pts/2   Sl+  19:27   0:14 /opt/stap/bin/stapdyn -c stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1 /tmp/stapUFtNuG/stap_73897a360cc4a5d2afa80ca6713de250_3356.so
```

All of these processes are stuck on the `S+` or `Sl+` states, i.e., all sleeping.

The backtrace of the /opt/stap/bin/stapdyn process (pid 75619) obtained by gdb is as follows:

```
(gdb) bt
(gdb) bt
#0  0x00007ffb3732182d in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffb2c000c68) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffb2c000c18, cond=0x7ffb2c000c40) at pthread_cond_wait.c:502
#2  __pthread_cond_wait (cond=cond@entry=0x7ffb2c000c40, mutex=mutex@entry=0x7ffb2c000c18) at pthread_cond_wait.c:655
#3  0x00007ffb360f18d0 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x7ffb2c000c18) at /usr/include/boost/thread/pthread/condition_variable.hpp:178
#4  CondVar<Mutex<false> >::wait (this=0x7ffb2c000c18) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/common/src/dthread.h:121
#5  MailboxMT::dequeue (this=0x7ffb2c000b20, block=true) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/mailbox.C:156
#6  0x00007ffb36118bbe in int_process::waitAndHandleEvents (block=block@entry=false) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:1060
#7  0x00007ffb3614599d in Dyninst::ProcControlAPI::ProcessSet::stopProcs (this=0x7ffb2c001630) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/procset.C:1499
#8  0x00007ffb360fa2ae in Dyninst::ProcControlAPI::Process::stopProc (this=<optimized out>) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:6488
#9  0x00007ffb378d5f68 in BPatch_process::finalizeInsertionSet (this=0x20517c0) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/dyninstAPI/src/BPatch_process.C:814
#10 0x000000000040e76a in mutatee::remove_instrumentation (this=this@entry=0x289cbd0) at mutatee.cxx:680
#11 0x000000000040e7cd in mutatee::~mutatee (this=0x289cbd0, __in_chrg=<optimized out>) at mutatee.cxx:141
#12 0x000000000040d919 in __gnu_cxx::new_allocator<mutatee>::destroy<mutatee> (__p=<optimized out>, this=<optimized out>) at /usr/include/c++/7/ext/new_allocator.h:140
#13 std::allocator_traits<std::allocator<mutatee> >::destroy<mutatee> (__p=<optimized out>, __a=...) at /usr/include/c++/7/bits/alloc_traits.h:487
#14 std::_Sp_counted_ptr_inplace<mutatee, std::allocator<mutatee>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:535
#15 0x000000000040d676 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x289cbc0) at /usr/include/c++/7/bits/shared_ptr_base.h:154
#16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684
#17 std::__shared_ptr<mutatee, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123
#18 std::shared_ptr<mutatee>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93
#19 std::_Destroy<std::shared_ptr<mutatee> > (__pointer=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:98
#20 std::_Destroy_aux<false>::__destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:108
#21 std::_Destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:137
#22 std::_Destroy<std::shared_ptr<mutatee>*, std::shared_ptr<mutatee> > (__last=0x7a60140, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:206
#23 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::_M_erase_at_end (__pos=0x7a60130, this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1513
#24 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::clear (this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1248
#25 mutator::run (this=0x2020a40) at mutator.cxx:642
#26 0x0000000000408328 in main (argc=<optimized out>, argv=<optimized out>) at stapdyn.cxx:161
```

The full backtrace from the gdb command `bt full` is as follows:

```
(gdb) bt full
#0  0x00007ffb3732182d in futex_wait_cancelable (private=<optimized out>, expected=0, futex_word=0x7ffb2c000c68) at ../sysdeps/unix/sysv/linux/futex-internal.h:88
        __ret = -512
        oldtype = 0
        err = <optimized out>
#1  __pthread_cond_wait_common (abstime=0x0, mutex=0x7ffb2c000c18, cond=0x7ffb2c000c40) at pthread_cond_wait.c:502
        spin = 0
        buffer = {__routine = 0x7ffb373215a0 <__condvar_cleanup_waiting>, __arg = 0x7fff2f207870, __canceltype = 910300128, __prev = 0x0}
        cbuffer = {wseq = 28, cond = 0x7ffb2c000c40, mutex = 0x7ffb2c000c18, private = 0}
        err = <optimized out>
        g = 0
        flags = <optimized out>
        signals = <optimized out>
        result = 0
        seq = 14
#2  __pthread_cond_wait (cond=cond@entry=0x7ffb2c000c40, mutex=mutex@entry=0x7ffb2c000c18) at pthread_cond_wait.c:655
No locals.
#3  0x00007ffb360f18d0 in boost::condition_variable_any::wait<Mutex<false> > (m=..., this=0x7ffb2c000c18) at /usr/include/boost/thread/pthread/condition_variable.hpp:178
        guard = {m = 0x7ffb2c0013b0}
        check_for_interruption = {thread_info = 0x0, m = 0x7ffb2c000c18, set = false}
        res = 0
#4  CondVar<Mutex<false> >::wait (this=0x7ffb2c000c18) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/common/src/dthread.h:121
No locals.
#5  MailboxMT::dequeue (this=0x7ffb2c000b20, block=true) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/mailbox.C:156
        q = <optimized out>
        ret = <optimized out>
#6  0x00007ffb36118bbe in int_process::waitAndHandleEvents (block=block@entry=false) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:1060
        hasAsyncPending = <optimized out>
        hasRunningThread = -8
        hasNeonatalThreads = <optimized out>
        terminating = <optimized out>
        exitEvent = <optimized out>
        hasAsyncEvents = <optimized out>
        hasHandlerThread = <optimized out>
        should_block = true
        ev = {px = 0x7ffb1c002090, pn = {pi_ = 0xe09df70}}
        hpool = <optimized out>
        hasClearingBP = <optimized out>
        hasSyncRPCRunningThrd = <optimized out>
        hasBlock = <optimized out>
        hasStartupTeardownProc = <optimized out>
        hasStopPending = <optimized out>
        hasProcStopRPC = <optimized out>
        hasGotEvent = <optimized out>
        proc = {px = 0x29f71f0, pn = {pi_ = 0x27abdb0}}
        should_handle_ev = <optimized out>
        gotEvent = <optimized out>
        __PRETTY_FUNCTION__ = "static bool int_process::waitAndHandleEvents(bool)"
        error = false
        recurse = true
#7  0x00007ffb3614599d in Dyninst::ProcControlAPI::ProcessSet::stopProcs (this=0x7ffb2c001630) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/procset.C:1499
        lock_this_func = {should_unlock = true}
        had_error = false
        had_success = true
        error_set = std::set with 0 elements
        iter = {msg = 0x7ffb361b56ff "stopProc", had_error = @0x7fff2f207abf, flags = 19, container = 0x10d45070, iter = {px = 0x1, pn = {pi_ = 0x40}}, finished_clear = false, did_begin = true}
        result = <optimized out>
#8  0x00007ffb360fa2ae in Dyninst::ProcControlAPI::Process::stopProc (this=<optimized out>) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/proccontrol/src/process.C:6488
        ps = {px = 0x7ffb2c001630, pn = {pi_ = 0x7ffb1c001cc0}}
#9  0x00007ffb378d5f68 in BPatch_process::finalizeInsertionSet (this=0x20517c0) at /usr/src/debug/dyninst-9.3.2-7.fc27.x86_64/dyninst-9.3.2/dyninstAPI/src/BPatch_process.C:814
        shouldContinue = false
        ret = <optimized out>
        this = 0x20517c0
        shouldContinue = false
#10 0x000000000040e76a in mutatee::remove_instrumentation (this=this@entry=0x289cbd0) at mutatee.cxx:680
No locals.
#11 0x000000000040e7cd in mutatee::~mutatee (this=0x289cbd0, __in_chrg=<optimized out>) at mutatee.cxx:141
No locals.
#12 0x000000000040d919 in __gnu_cxx::new_allocator<mutatee>::destroy<mutatee> (__p=<optimized out>, this=<optimized out>) at /usr/include/c++/7/ext/new_allocator.h:140
No locals.
#13 std::allocator_traits<std::allocator<mutatee> >::destroy<mutatee> (__p=<optimized out>, __a=...) at /usr/include/c++/7/bits/alloc_traits.h:487
No locals.
#14 std::_Sp_counted_ptr_inplace<mutatee, std::allocator<mutatee>, (__gnu_cxx::_Lock_policy)2>::_M_dispose (this=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:535
No locals.
#15 0x000000000040d676 in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x289cbc0) at /usr/include/c++/7/bits/shared_ptr_base.h:154
No locals.
#16 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:684
No locals.
---Type <return> to continue, or q <return> to quit---
#17 std::__shared_ptr<mutatee, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr_base.h:1123
No locals.
#18 std::shared_ptr<mutatee>::~shared_ptr (this=<optimized out>, __in_chrg=<optimized out>) at /usr/include/c++/7/bits/shared_ptr.h:93
No locals.
#19 std::_Destroy<std::shared_ptr<mutatee> > (__pointer=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:98
No locals.
#20 std::_Destroy_aux<false>::__destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:108
No locals.
#21 std::_Destroy<std::shared_ptr<mutatee>*> (__last=<optimized out>, __first=<optimized out>) at /usr/include/c++/7/bits/stl_construct.h:137
No locals.
#22 std::_Destroy<std::shared_ptr<mutatee>*, std::shared_ptr<mutatee> > (__last=0x7a60140, __first=0x7a60130) at /usr/include/c++/7/bits/stl_construct.h:206
No locals.
#23 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::_M_erase_at_end (__pos=0x7a60130, this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1513
No locals.
#24 std::vector<std::shared_ptr<mutatee>, std::allocator<std::shared_ptr<mutatee> > >::clear (this=0x2020c00) at /usr/include/c++/7/bits/stl_vector.h:1248
No locals.
#25 mutator::run (this=0x2020a40) at mutator.cxx:642
No locals.
#26 0x0000000000408328 in main (argc=<optimized out>, argv=<optimized out>) at stapdyn.cxx:161
        session = std::unique_ptr<mutator> containing 0x2020a40
        pid = 0
        command = 0x7fff2f209296 "stap --benchmark-sdt-loops=1 --benchmark-sdt-threads=1"
        module = 0x7fff2f2092cd "/tmp/stapUFtNuG/stap_73897a360cc4a5d2afa80ca6713de250_3356.so"
        opt = <optimized out>
        modoptions = std::vector of length 0, capacity 0
```

More info:

```
$ gcc --version
gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-6)
Copyright (C) 2017 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ uname -a
Linux work2 4.16.16-200.fc27.x86_64 #1 SMP Sun Jun 17 03:06:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

$ free
              total        used        free      shared  buff/cache   available
Mem:       11981784      689308     6512768       64964     4779708    10166620
Swap:       2097148           0     2097148

$ cat /etc/redhat-release
Fedora release 27 (Twenty Seven)
```
Comment 1 agentzh 2018-08-26 03:18:47 UTC
Oh, BTW, I used the following command to run the test suite:

```
sudo make -j5 installcheck-parallel
```
Comment 2 agentzh 2018-08-26 03:41:27 UTC
Running the offending test file `systemtap.base/target_set_thread.exp` individually does not seem to reproduce the hang. Weird.
Comment 3 Frank Ch. Eigler 2018-08-27 17:02:48 UTC
commit 0aee37568280 may work around this probable dyninst bug