Bug 23513

Summary: Dyninst runtime kills the target process when attaching to the target process for a second time
Product: systemtap Reporter: agentzh <agentzh>
Component: dyninstAssignee: Unassigned <systemtap>
Status: ASSIGNED ---    
Severity: normal CC: avi, fche, scox
Priority: P2    
Version: unspecified   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed: 2019-04-10 00:00:00
Attachments: stap-report output on the same box

Description agentzh 2018-08-11 00:34:43 UTC
Created attachment 11178 [details]
stap-report output on the same box

Consider the following simple forever-running C program:

```C
#include <unistd.h>

int main(void) {
    while (1) {
        usleep (1000);
    }
    return 0;
}
```

Compile it and generate `./a.out`:

```
gcc -g -Wall a.c
```

Then start it in one terminal (let's call it terminal A):

```
$ ./a.out
```

It hangs as expected.

Then try running the stap dyninst command like below in another terminal (let's say, terminal B):

```
$ /opt/stap/bin/stap -e 'probe process("/lib64/libc.so.6").function("usleep") { println("hi") exit() }' -x `pgrep -f './a.out'` --dyninst
hi
```

It works as expected. And in terminal A, the `./a.out` program is still running. So far so good.

But now as soon as we run the `stap` command above again, we got the following error on terminal B:

```
$ /opt/stap/bin/stap -e 'probe process("/lib64/libc.so.6").function("usleep") { println("hi") exit() }' -x `pgrep -f './a.out'` --dyninst
--FATAL-- #68: Dyninst was unable to attach to the specified process
--FATAL-- #68: BPatch.C[1047]: no process 40758 defined in procsByPid

stapdyn: ERROR: Couldn't attach to the target process
WARNING: /opt/stap/bin/stapdyn exited with status: 1
Pass 5: run failed.  [man error::pass5]
```

And in terminal A, the `./a.out` process is already killed too:

```
agentzh@work2 ~ $ ./a.out
Killed
```

This can also be reproduced by a `probe begin` one-liner:

```
agentzh@work2 ~ $ /opt/stap/bin/stap -e 'probe begin { println("hi") exit() }' -x `pgrep -f './a.out'` --dyninst
hi

agentzh@work2 ~ $ /opt/stap/bin/stap -e 'probe begin { println("hi") exit() }' -x `pgrep -f './a.out'` --dyninst
--FATAL-- #68: Dyninst was unable to attach to the specified process
--FATAL-- #68: BPatch.C[1047]: no process 40816 defined in procsByPid

stapdyn: ERROR: Couldn't attach to the target process
WARNING: /opt/stap/bin/stapdyn exited with status: 1
Pass 5: run failed.  [man error::pass5]
```

I'm using the 3.3 release tag of systemtap on Fedora 27 x86_64:

```
$ /opt/stap/bin/stap --version
Systemtap translator/driver (version 3.3/0.173, commit release-3.3-0-g48867d1cface)
Copyright (C) 2005-2018 Red Hat, Inc. and others
This is free software; see the source for copying conditions.
tested kernel versions: 2.6.18 ... 4.18-rc0
enabled features: BOOST_STRING_REF DYNINST BPF NLS NSS
```

Also attached the stap-report output for the same box.

Am I doing anything wrong here?
Comment 1 Frank Ch. Eigler 2018-08-11 00:38:57 UTC
The second dyninst-based probe's failure is expected: there can be only one concurrent "mutator" for a given target process.  It is not right that the process should be killed though, that sounds like a bug.
Comment 2 agentzh 2018-08-11 00:45:21 UTC
But the first `stap` command run already exits before the second one starts (please note the `exit()` function call in the probe handler). So there is still only *one* concurrent mutator here.
Comment 3 Stan Cox 2019-04-05 19:07:19 UTC
I notice if the first dyninst run is via stap:
% stap -e 'probe process("/lib64/libc.so.6").function("usleep") { println("hi") exit() }' -x `pgrep -f tst.x` --dyninst -k -vv

and if subsequent runs were directly via stapdyn then it worked okay:
% stapdyn -v -x 8955 /tmp/staprclqlD/stap_8970.so
stapdyn: found target "/usr/lib64/libc-2.28.so" in pid 8955, inserting 1 probes
hi

% stapdyn -v -x 8955 /tmp/staprclqlD/stap_8970.so
stapdyn: found target "/usr/lib64/libc-2.28.so" in pid 8955, inserting 1 probes
hi
Comment 4 Stan Cox 2019-04-06 01:01:44 UTC
Running with DYNINST_DEBUG_STARTUP=1 
it seems the problem is dyninst fails to parse the stap so (not sure why) and intentionally terminates the process.

[140306531820032]image.C[1461]:  opening file /tmp/stapSmTcdi/stap_2ecef162cf9d3c769f9c52398beab57b_1423.so
[140306531820032]image.C[1226]:  created image
[140306531820032]image.C[1229]: processing shared object
[140306531820032]image.C[1247]: error in processing, deleting image and returning
[140306531820032]mapped_object.C[157]:  failed to parseImage
[140306531820032]dynProcess.C[618]: failed to create mapped object for library /tmp/stapSmTcdi/stap_2ecef162cf9d3c769f9c52398beab57b_1423.so
[140306531820032]dynProcess.C[361]: bootstrap failed while creating mapped objects
Failed to bootstrap process 9255: terminating...
Comment 5 Stan Cox 2019-04-10 12:42:17 UTC
The problem seems intermittant but I do see it and the culprit is that the stap shared object is not always detached when stapdyn completes.  (gdb session after failing stapdyn)

(gdb) info sharedlibrary
From                To                  Syms Read   Shared Object Library
0x00007f3bd7f64670  0x00007f3bd80afc3f  Yes (*)     /lib64/libc.so.6
0x00007f3bd8139110  0x00007f3bd81582b4  Yes (*)     /lib64/ld-linux-x86-64.so.2
0x00007f3bd6eb2320  0x00007f3bd6eb4845  Yes (*)     /usr/lib64/dyninst/libdyninstAPI_RT.so
0x00007f3bd6eab270  0x00007f3bd6eac039  Yes (*)     /lib64/libdl.so.2
                                        No          /tmp/stapUaA2KQ/stap_4e52f2023ce6edaabf143f6d5214b684_1361.so
0x00007f3bd8124710  0x00007f3bd8127a80  Yes (*)     /lib64/librt.so.1
0x00007f3bd6e60b50  0x00007f3bd6e6f025  Yes (*)     /lib64/libpthread.so.0
(*): Shared library is missing debugging information.

So whenever stapdyn subsequently runs, dyninst tries to load that nonexistant library: /tmp/stapUaA2KQ/stap_4e52f2023ce6edaabf143f6d5214b684_1361.so