Sources Bugzilla – Bug 12141
Segmentation fault in apps probed by libstdc++ i686
Last modified: 2010-11-23 22:10:22 UTC
I can reproduce this on Fedora 12 and 13, i686 only: $ stap -e 'probe process("/usr/lib/libstdc++.so.6").function("operator new") {next}' -c 'stap -V' ... produces no output, when the -c stap should print its version. So instead I left that probe running on its own, and ran the "stap -V" in gdb: (gdb) run -V Starting program: /usr/local/bin/stap -V [Thread debugging using libthread_db enabled] Program received signal SIGSEGV, Segmentation fault. 0x0020914c in _Settings () at /usr/src/debug/gcc-4.4.4-20100630/obj-i686-redhat-linux/i686-redhat-linux/libstdc++-v3/include/parallel/settings.h:276 276 _Settings() : algorithm_strategy(heuristic), [...] (gdb) bt #0 0x0020914c in _Settings () at /usr/src/debug/gcc-4.4.4-20100630/obj-i686-redhat-linux/i686-redhat-linux/libstdc++-v3/include/parallel/settings.h:276 #1 __static_initialization_and_destruction_0 () at ../../../../libstdc++-v3/src/parallel_settings.cc:29 #2 global constructors keyed to parallel_settings.cc(void) () at ../../../../libstdc++-v3/src/parallel_settings.cc:42 #3 0x0021208d in __do_global_ctors_aux () from /usr/lib/libstdc++.so.6 #4 0x0019cd34 in _init () from /usr/lib/libstdc++.so.6 #5 0x00771da0 in _dl_init_internal () from /lib/ld-linux.so.2 #6 0x0076388f in _dl_start_user () from /lib/ld-linux.so.2 NB: We haven't even reached main! (gdb) disassemble /r Dump of assembler code for function _GLOBAL__I_parallel_settings.cc(void): [...] 0x00209142 <+674>: c7 81 dc 29 00 00 00 00 00 00 movl $0x0,0x29dc(%ecx) => 0x0020914c <+684>: c7 81 e0 29 cc 00 10 27 00 00 movl $0x2710,0xcc29e0(%ecx) 0x00209156 <+694>: c7 81 e4 29 00 00 00 00 00 00 movl $0x0,0x29e4(%ecx) The 0xCC sticks out like a sore thumb... And indeed, if I kill the running probe, this 0xCC turns back into 0x00, so it's definitely our uprobes INT3, in a very bad place. $ stap -l 'process("/usr/lib/libstdc++.so.6").function("operator new")' -vv |& grep pc= probe operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37 process=/usr/lib/libstdc++.so.6.0.13 reloc=.dynamic pc=0xaf200 probe operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45 process=/usr/lib/libstdc++.so.6.0.13 reloc=.dynamic pc=0xaf150 $ nm /usr/lib/debug/usr/lib/libstdc++.so.6.0.13.debug | c++filt | grep 'operator new(' 000af150 T operator new(unsigned int) 000af200 T operator new(unsigned int, std::nothrow_t const&) So far, so good. With -DDEBUG_UPROBES, I see: stap_uprobe_change_plus:67: +uprobe spec 0 idx 0 process stap[2907] addr 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37") stap_uprobe_change_plus:67: +uprobe spec 1 idx 1 process stap[2907] addr 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45") stap_uprobe_change_minus:225: -uprobe spec 0 idx 0 process stap[2907] reloc 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37") stap_uprobe_change_minus:225: -uprobe spec 1 idx 1 process stap[2907] reloc 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45") The addrs here confirm what I saw in gdb's disassembly, but they're clearly not the right place. So where would gdb probe? (gdb) b 'operator new(unsigned int)' Breakpoint 2 at 0x20b8ca: file ../../../../libstdc++-v3/libsupc++/new_op.cc, line 46. (gdb) b 'operator new(unsigned int, std::nothrow_t const&)' Breakpoint 3 at 0x20b97a: file ../../../../libstdc++-v3/libsupc++/new_opnt.cc, line 38. So it looks like our probe addresses are missing some offset, as they're both 0x277a off from where they should be.
I can't reproduce this on x86-64. What does -DDEBUG_SYMBOLS=2 -DDEBUG_UPROBES -DDEBUG_TASK_FINDER_VMA offer?
(In reply to comment #1) > I can't reproduce this on x86-64. Well, I did say i686 only... > What does -DDEBUG_SYMBOLS=2 -DDEBUG_UPROBES -DDEBUG_TASK_FINDER_VMA offer? Let me know if you want the entire dump, but AFAICS here's the relevant excerpt: > __stp_call_mmap_callbacks:611: pid 1505, a/l/o/p/path 0x15a000 0xed000 0x0 r-xp /usr/lib/libstdc++.so.6.0.13 > stap_uprobe_mmap_found:274: +mmap R-X pid 1505 path /usr/lib/libstdc++.so.6.0.13 addr 0015a000 length 970752 offset (null) stf e1d7bd88 e1d7bd88 path /usr/lib/libstdc++.so.6.0.13 > stap_uprobe_change_plus:67: +uprobe spec 0 idx 0 process stap[1505] addr 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37") > stap_uprobe_change_plus:67: +uprobe spec 1 idx 1 process stap[1505] addr 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45") > __stp_utrace_task_finder_target_syscall_exit:1435: tsk 1505 found mmap2(0x0), returned 0x23b000 > __stp_call_mmap_callbacks:611: pid 1505, a/l/o/p/path 0x23b000 0x6000 0xe0000 rw-p /usr/lib/libstdc++.so.6.0.13 > stap_uprobe_mmap_found:279: +mmap RW- pid 1505 path /usr/lib/libstdc++.so.6.0.13 addr 0023b000 length 24576 offset 000e0000 stf e1d7bd88 e1d7bd88 path /usr/lib/libstdc++.so.6.0.13
Some additional insight from chatting with fche on IRC... The 0x277a address difference between stap and gdb includes gdb's prologue searching. In both cases, this is an offset 0x1a into the function, so the real difference between stap and gdb's notion of function start is 0x2760. Now for a little numerology: > $ eu-readelf -S /usr/lib/libstdc++.so.6.0.13 | grep text > [12] .text PROGBITS 00b58610 045610 072a88 0 AX 0 0 16 > $ eu-readelf -S /usr/lib/debug/usr/lib/libstdc++.so.6.0.13.debug | grep text > [12] .text NOBITS 00042eb0 000160 072a88 0 AX 0 0 16 > $ python -c 'print(hex(0x45610 - 0x42eb0))' > 0x2760 For comparison, the working x86_64 has no difference in those numbers: > $ eu-readelf -S /usr/lib64/libstdc++.so.6.0.13 | grep text > [12] .text PROGBITS 0000003b706563f0 000563f0 0006d006 0 AX 0 0 16 > $ eu-readelf -S /usr/lib/debug/usr/lib64/libstdc++.so.6.0.13.debug | grep text > [12] .text NOBITS 00000000000563f0 00000230 0006d006 0 AX 0 0 16 It appears that prelink is responsible for this shift, although only on i686. Examining the virgin file from the rpm, or even doing prelink -u, gets the address back to matching what's in the debuginfo: > $ eu-readelf -S ~/libstdc++-4.4.4-10.fc13.i686/usr/lib/libstdc++.so.6.0.13 | grep text > [12] .text PROGBITS 00042eb0 042eb0 072a88 0 AX 0 0 16
Roland explained this effect: <roland> it is prelink's REL->RELA conversion in DSOs, which moves the real code (.text et al) relative to the start of the mapping He also said this is probably best fixed in libdwfl.
elfutils commit 1743d7f should fix this problem. That elfutils source should get some thorough regression testing too.
Thanks, the new elfutils appears to fix this problem, at least on rawhide-32, using a --with-elfutils bundled style build. Looking forward to a full elfutils release.
elfutils 0.150 is in rawhide and percolating its way through Fedora updates for 13 and 14.