Summary: | setjmp/longjmp clashing with u[ret]probes, causing SIGSEGV or SIGILL | ||
---|---|---|---|
Product: | systemtap | Reporter: | Frank Ch. Eigler <fche> |
Component: | uprobes | Assignee: | Unassigned <systemtap> |
Status: | RESOLVED WORKSFORME | ||
Severity: | normal | CC: | mark, mcermak |
Priority: | P2 | ||
Version: | unspecified | ||
Target Milestone: | --- | ||
Host: | Target: | ||
Build: | Last reconfirmed: |
Description
Frank Ch. Eigler
2015-03-28 00:27:39 UTC
Reproduced using perf. Problem is the order of commandline switches produced by stap2perf: Flipping -a and -x seems to help: ======= # perf probe -x /usr/share/systemtap/testsuite/bz5274 -a bz5274_funca_return=funca%return # perf probe -x /usr/share/systemtap/testsuite/bz5274 -a bz5274_funcb_return=funcb%return ... ======= After this I have: ======= # cat //sys/kernel/debug/tracing/uprobe_events r:probe_bz5274/bz5274_funca_return /usr/share/systemtap/testsuite/bz5274:0x00000000000006e6 r:probe_bz5274/bz5274_funcb_return /usr/share/systemtap/testsuite/bz5274:0x00000000000006a0 p:probe_bz5274/bz5274_funcc /usr/share/systemtap/testsuite/bz5274:0x000000000000065a p:probe_bz5274/bz5274_funcd /usr/share/systemtap/testsuite/bz5274:0x0000000000000616 r:probe_bz5274/bz5274_main_return /usr/share/systemtap/testsuite/bz5274:0x0000000000000725 ======= Now using -e 'probe_bz5274:*' instead of -e 'probe:*' I'm getting: ======= # /root/bin/perf record -e 'probe_bz5274:*' -aR ./bz5274 In bz5274.c: funcb :20 : i=1. Calling funcc In bz5274.c: funcc :13 : i=1. Calling funcd In bz5274.c: funcd :7 : i=1. Calling longjmp In bz5274.c: funcb :20 : i=2. Calling funcc In bz5274.c: funcc :13 : i=2. Calling funcd In bz5274.c: funcd :7 : i=2. Calling longjmp In bz5274.c: funcb :20 : i=3. Calling funcc In bz5274.c: funcc :13 : i=3. Calling funcd In bz5274.c: funcd :7 : i=3. Calling longjmp In bz5274.c: funcb :20 : i=4. Calling funcc In bz5274.c: funcc :13 : i=4. Calling funcd In bz5274.c: funcd :7 : i=4. Calling longjmp [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.166 MB perf.data (10 samples) ] Segmentation fault (core dumped) [root@dhcp-1-102 testsuite]# ======= But now I'm getting two corefiles: ======= [root@dhcp-1-102 testsuite]# ls core* core.2097 core.2098 [root@dhcp-1-102 testsuite]# file core.2200 core.2200: ELF 64-bit LSB core file x86-64, version 1 (SYSV), too many program headers (163) [root@dhcp-1-102 testsuite]# file core.2201 core.2201: ELF 64-bit LSB core file x86-64, version 1 (SYSV), SVR4-style, from './bz5274' [root@dhcp-1-102 testsuite]# [root@dhcp-1-102 testsuite]# gdb -q ./bz5274 core.2097 Reading symbols from ./bz5274...done. warning: core file may not match specified executable file. ---------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^--------------------------------------------------- [New LWP 2097] Core was generated by `/root/bin/perf record -e probe_bz5274:* -aR ./bz5274'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000003e38e0ff99 in ?? () (gdb) info registers rax 0x0 0 rbx 0x0 0 rcx 0xffffffffffffffff -1 rdx 0xb 11 rsi 0x831 2097 rdi 0x831 2097 rbp 0x3e389b76a0 0x3e389b76a0 rsp 0x7fff705200f8 0x7fff705200f8 r8 0x7fff70520030 140735077810224 r9 0x7fff7051fe40 140735077809728 r10 0xa5 165 r11 0x202 514 r12 0x1 1 r13 0x3e389b8e80 267237691008 r14 0x8365d0 8611280 r15 0x0 0 rip 0x3e38e0ff99 0x3e38e0ff99 eflags 0x202 [ IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) q [root@dhcp-1-102 testsuite]# [root@dhcp-1-102 testsuite]# gdb -q ./bz5274 core.2098 Reading symbols from ./bz5274...done. [New LWP 2098] Core was generated by `./bz5274'. Program terminated with signal SIGSEGV, Segmentation fault. #0 0x0000000000400724 in funca (s=0xffff95e800400876 <error: Cannot access memory at address 0xffff95e800400876>, env=0xbfc68948ffffff30) at systemtap.base/bz5274.c:34 34 } (gdb) info registers rax 0x4 4 rbx 0x0 0 rcx 0x7fffffe4 2147483620 rdx 0x400702 4196098 rsi 0x4 4 rdi 0x7fff359cc380 140734092854144 rbp 0x41ff894156415741 0x41ff894156415741 rsp 0x400768 0x400768 <__libc_csu_init+8> r8 0x7fff359cc340 140734092854080 r9 0x7fff359cc360 140734092854112 r10 0x0 0 r11 0x246 582 r12 0x400520 4195616 r13 0x7fff359cc530 140734092854576 r14 0x0 0 r15 0x0 0 rip 0x400724 0x400724 <funca+62> eflags 0x10246 [ PF ZF IF RF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 (gdb) ======= So I assume only the second corefile is relevant. The $rsp corruption seems to happen here too. Just a note that it seems the same is true for the golang runtime. Although I haven't pinpointed it precisely it seems at least some code in the garbage collector seems to get upset when we place return probe. e.g. $ cat hello.go package main import "fmt" func main() { fmt.Printf("Hello, world.\n") } $ go build hello.go $ ./run-stap -e 'probe process.function("*").call {printf("%s->%s %s\n", thread_indent (1), ppfunc (), $$parms)} probe process.function("*").return {printf("%s<-%s %s\n", thread_indent (-1), ppfunc (), $$return)}' -c ./hello [...] fatal error: panic on m stack runtime stack: WARNING: Child process exited with status 2 [...] Without the process.function("*").return probe things look fine (assuming you have the fixes for PR17957, PR17958 and PR17959). Oleg's patches at http://www.spinics.net/lists/kernel/msg1981148.html look good on x86-64. Kernels much newer than these are working fine. |