Bug 23701 - dyninst makes the target process enter an infinite loop
Summary: dyninst makes the target process enter an infinite loop
Status: UNCONFIRMED
Alias: None
Product: systemtap
Classification: Unclassified
Component: dyninst (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-09-23 03:18 UTC by agentzh
Modified: 2019-04-05 17:04 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description agentzh 2018-09-23 03:18:09 UTC
I'm observing a complicated stap tool when running in the dyninst mode, enters an infinite loop when generating about 8.1MB of output (the expected total output is about 100MB).

The stapdyn process is in sleep mode:

```
agentzh    9848  0.7  2.7 651372 323632 pts/3   Sl+  19:49   0:09 /opt/stap-plus/bin/stapdyn -w -c perl /home/agentzh/git/fanlang/bin/fan bin/ylang.fan /tmp/stapuOnzkM/stap_24795b0deab05636e2d4368e574b6dce_106788.so
```

The perl process being probed always tops at 100% CPU:

```
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  9853 agentzh   20   0  240372  45260   6580 R 100.0  0.4  21:40.88 perl
```

The same tool works alright on both gdb python and the systemtap kernel mode (it's just painfully slow for gdb python).

I used another small stap script to sample the user backtraces of that perl process. It spends about 57% of the total CPU time on this backtrace:

```
 0x7f5adb9ebb99 : syscall+0x19/0x40 [/usr/lib64/libc-2.26.so]
 0x7f5adca736a0 : t_kill+0x40/0x70 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adca73993 : DYNINSTbreakPoint+0x33/0x50 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adca72275 : DYNINST_instForkEntry+0x35/0x60 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adcd00db2  :  <unknown>
    1129
```

(BTW, the number 1129 on the last line is the backtrace sample count.)

And it spends about 42% of the total CPU time on this backtrace:

```
 0x7f5adb9ebb99 : syscall+0x19/0x40 [/usr/lib64/libc-2.26.so]
 0x7f5adca73927 : dyn_lwp_self+0x17/0x50 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adca73987 : DYNINSTbreakPoint+0x27/0x50 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adca72275 : DYNINST_instForkEntry+0x35/0x60 [/usr/lib64/dyninst/libdyninstAPI_RT.so.9.3.2]
 0x7f5adcd00db2  :  <unknown>
    828
```

It is more intuitive to see the corresponding CPU flame graph below (though it has less info than the raw backtraces):

https://openresty.org/misc/flamegraph/orinc/work2-20180922-oncpu-9853.svg

I'm using the latest version of the pre-built dyninst package on Fedora 27:

```
Installed Packages
Name         : dyninst
Version      : 9.3.2
Release      : 7.fc27
Arch         : x86_64
Size         : 12 M
Source       : dyninst-9.3.2-7.fc27.src.rpm
Repo         : @System
From repo    : updates
Summary      : An API for Run-time Code Generation
URL          : http://www.dyninst.org
License      : LGPLv2+
```

The system is Fedora 27 x86_64:

```
$ uname -a
Linux work2 4.16.16-200.fc27.x86_64 #1 SMP Sun Jun 17 03:06:00 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
```

I'm using the latest master branch of systemtap.

Any suggestions on tracing this bug further and/or workarounds will be highly appreciated. Thanks!
Comment 1 Stan Cox 2019-04-05 17:04:39 UTC
stapdyn uses dyninst to dynamically insert calls to the stap shared object and then invokes it.  You could try keeping the stap so around so stapdyn can be invoked separately
stap --runtime=dyninst -c ./a.out a.stp -vv --dyninst -k

and then invoke with dyninst and stapdyn debugging:

DYNINST_DEBUG_INST=1 stap -vvvv -e 'probe process("/usr/bin/eu-nm").statement("main@*:*") {println(pp());}'  -c 'eu-nm' -vv --dyninst -k

also DYNINST_DEBUG_PROCCONTROL