24884 – stapdyn crashes with a segmentation fault

Bug 24884 - stapdyn crashes with a segmentation fault

Summary: stapdyn crashes with a segmentation fault

Status:	WAITING

Alias:	None

Product:	systemtap
Classification:	Unclassified
Component:	dyninst (show other bugs)
Version:	unspecified

Importance:	P2 normal
Target Milestone:	---
Assignee:	Stan Cox

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-08-05 16:42 UTC by Avi Kivity
Modified:	2020-06-19 15:02 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2019-12-10 00:00:00

Attachments
dyninst static marker test (919 bytes, application/x-xz-compressed-tar) 2019-12-10 21:29 UTC, Stan Cox	Details
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Avi Kivity 2019-08-05 16:42:59 UTC

Trying the following script:


#!/usr/bin/stap

# usage: task-histogram.stap process_name

global hist

probe process.mark("reactor_run_tasks_single_start") {
    ++hist[tid(), $arg1]
}

probe end {
    foreach ([tid, addr] in hist) {
        printf("%10d %8d 0x%x\n", hist[tid, addr], tid, addr)
    }
}


(trying to collect a histogram of tasks)

With this command line:

    stap --dyninst -x $(pgrep -x httpd) ./debug/task-histogram.stap

Crashes with

WARNING: /usr/bin/stapdyn exited with signal: 11 (Segmentation fault)


systemtap-4.1-1.fc30.x86_64

Comment 1 Frank Ch. Eigler 2019-08-23 22:20:59 UTC

Hi, Avi, sorry for not noticing this earlier.  Some questions to assist in the local reproduction of this problem:

- arch: x86-64?

- same script works in lkm (non-dyninst) mode?

- tried   stap -p4 --dyninst FOO.stp  ;   gdb -args stapbpf FOO.so   
  so as to get a gdb backtrace at the crash site?

- what level of traffic is the httpd process absorbing during this time?
  (thus: how much thread / child-process changes?)

- tried targeting a program other than this httpd?

Comment 2 Avi Kivity 2019-11-12 13:08:23 UTC

Sorry for noticing _your_ comment so late. I retested with systemtap-4.1-2.fc30.x86_64, and it appears to work.

(x86_64, don't remember if I tried lkm, httpd has no forks/pthread_creates at all)

Comment 3 Avi Kivity 2019-11-12 13:11:24 UTC

The performance impact is horrendous however, 5X slower (251k req/sec without the script, 47k with the script). Does dyninst rewrite the entire program or just the entry points to the probe?

Comment 4 Avi Kivity 2019-11-12 13:21:11 UTC

And now I get segmentation faults again.

#0  int_process::removeAllBreakpoints (this=0x55bf9c39eab8) at /usr/include/c++/9/bits/stl_tree.h:208
#1  0x00007f0c1737589f in linux_process::preTerminate (this=0x55bf9c39e820) at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/proccontrol/src/linux.C:1740
#2  0x00007f0c1733a4d1 in Dyninst::ProcControlAPI::ProcessSet::terminate (this=0x55bfb12f09e0) at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/proccontrol/src/procset.C:1644
#3  0x00007f0c172d95db in Dyninst::ProcControlAPI::Process::terminate (this=<optimized out>) at /usr/include/boost/smart_ptr/shared_ptr.hpp:732
#4  0x00007f0c17da7ab5 in PCProcess::terminateProcess (this=0x55bfa0f37cb0) at /usr/include/boost/smart_ptr/shared_ptr.hpp:732
#5  PCProcess::terminateProcess (this=0x55bfa0f37cb0) at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/dyninstAPI/src/dynProcess.C:1027
#6  0x00007f0c17db377c in PCProcess::attachProcess (progpath=..., pid=16581, analysisMode=BPatch_normalMode) at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/dyninstAPI/src/dynProcess.C:162
#7  0x00007f0c17cf269a in BPatch_process::BPatch_process(char const*, int, BPatch_hybridMode) () at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/dyninstAPI/src/BPatch_process.C:328
#8  0x00007f0c17ccf027 in BPatch::processAttach (this=<optimized out>, path=0x0, pid=16581, mode=BPatch_normalMode) at /usr/src/debug/dyninst-10.0.0-7.fc30.x86_64/dyninst-10.0.0/dyninstAPI/src/BPatch.C:1260
#9  0x000055bf9abb4519 in ?? ()
#10 0x000055bf9c362440 in ?? ()
#11 0x00007ffd51a63ec0 in ?? ()
#12 0x00000000000040c5 in probe_13397 ()
#13 0x0000000000000000 in ?? ()

Comment 5 Avi Kivity 2019-11-12 13:25:05 UTC

I think the trigger for the crash is re-attaching to a process after detaching from it.

Comment 6 Avi Kivity 2019-11-12 13:38:10 UTC

And the cause for the slowness is lock contention. With only two threads. Please please please add thread-local storage to the language.

Comment 7 Frank Ch. Eigler 2019-11-23 01:26:41 UTC

(In reply to Avi Kivity from comment #6)
> And the cause for the slowness is lock contention. With only two threads.
> Please please please add thread-local storage to the language.

It's more of a runtime issue than a language issue, but yeah.  Surely there are some optimization opportunities in what we emit for:

    probe process.mark("reactor_run_tasks_single_start") {
        ++hist[tid(), $arg1]
    }

Comment 8 Avi Kivity 2019-11-24 08:57:10 UTC

I imagine that if you notice that a key component is always tid() (except an in an end probe) then you can rewrite the global map as a thread local map with extra magic for the end probe.

But it seems fragile, as soon as you violate one of the constraints even a tiny bit, it stops working with no feedback to the user about what went wrong. And when it stops working, it's likely to have a huge impact on the running workload.

Comment 9 Avi Kivity 2019-12-04 10:31:56 UTC

Is there more information I can supply to help fix the segmentation fault?

Comment 10 Frank Ch. Eigler 2019-12-09 22:04:30 UTC

Stan might be able to help with the dyninst segv up in comment #4.
OTOH, there is a dyninst 10.1 build in stable updates, which would be worth retesting against.

Comment 11 Stan Cox 2019-12-10 21:29:26 UTC

Created attachment 12120 [details]
dyninst static marker test

Comment 12 Stan Cox 2019-12-10 21:31:36 UTC

I'll try with httpd; meanwhile a synthetic looping test using static markers seems to work fine:
 stap --dyninst -x $(pgrep -x tstgetline.x) ./tstgetline.stp
         3    29151 0x20a1260
with:
 dyninst-10.1.0-4.fc30.x86_64
 systemtap-4.2-1.fc30.x86_64

Comment 13 Stan Cox 2019-12-11 15:57:42 UTC

| I think the trigger for the crash is re-attaching to a process after detaching from it.

That sounds similar to bug 23513

Comment 14 Avi Kivity 2019-12-11 16:27:30 UTC

The httpd in question is not Apache httpd. I can provide a binary (and source of course) if needed. Meanwhile I'm following the detach bug.

Comment 15 Avi Kivity 2019-12-11 16:29:03 UTC

And please^19, do make it possible to attach probes to tracepoints that are hit with very high frequency.

Comment 16 Stan Cox 2020-06-19 15:02:24 UTC

> I can provide a binary (and source of course)

Yes please, that would be helpful