Bug 12141 - Segmentation fault in apps probed by libstdc++ i686
Summary: Segmentation fault in apps probed by libstdc++ i686
Status: RESOLVED FIXED
Alias: None
Product: systemtap
Classification: Unclassified
Component: runtime (show other bugs)
Version: unspecified
: P2 critical
Target Milestone: ---
Assignee: Unassigned
URL:
Keywords:
Depends on:
Blocks: 11179
  Show dependency treegraph
 
Reported: 2010-10-20 06:24 UTC by Josh Stone
Modified: 2010-11-23 22:10 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Josh Stone 2010-10-20 06:24:48 UTC
I can reproduce this on Fedora 12 and 13, i686 only:

$ stap -e 'probe process("/usr/lib/libstdc++.so.6").function("operator new") {next}' -c 'stap -V'

... produces no output, when the -c stap should print its version.  So instead I left that probe running on its own, and ran the "stap -V" in gdb:

(gdb) run -V
Starting program: /usr/local/bin/stap -V
[Thread debugging using libthread_db enabled]

Program received signal SIGSEGV, Segmentation fault.
0x0020914c in _Settings ()
    at /usr/src/debug/gcc-4.4.4-20100630/obj-i686-redhat-linux/i686-redhat-linux/libstdc++-v3/include/parallel/settings.h:276
276	    _Settings() : algorithm_strategy(heuristic), [...]
(gdb) bt
#0  0x0020914c in _Settings ()
    at /usr/src/debug/gcc-4.4.4-20100630/obj-i686-redhat-linux/i686-redhat-linux/libstdc++-v3/include/parallel/settings.h:276
#1  __static_initialization_and_destruction_0 ()
    at ../../../../libstdc++-v3/src/parallel_settings.cc:29
#2  global constructors keyed to parallel_settings.cc(void) ()
    at ../../../../libstdc++-v3/src/parallel_settings.cc:42
#3  0x0021208d in __do_global_ctors_aux () from /usr/lib/libstdc++.so.6
#4  0x0019cd34 in _init () from /usr/lib/libstdc++.so.6
#5  0x00771da0 in _dl_init_internal () from /lib/ld-linux.so.2
#6  0x0076388f in _dl_start_user () from /lib/ld-linux.so.2

NB: We haven't even reached main!

(gdb) disassemble /r 
Dump of assembler code for function _GLOBAL__I_parallel_settings.cc(void):
[...]
   0x00209142 <+674>:	 c7 81 dc 29 00 00 00 00 00 00	movl   $0x0,0x29dc(%ecx)
=> 0x0020914c <+684>:	 c7 81 e0 29 cc 00 10 27 00 00	movl   $0x2710,0xcc29e0(%ecx)
   0x00209156 <+694>:	 c7 81 e4 29 00 00 00 00 00 00	movl   $0x0,0x29e4(%ecx)

The 0xCC sticks out like a sore thumb...  And indeed, if I kill the running probe, this 0xCC turns back into 0x00, so it's definitely our uprobes INT3, in a very bad place.

$ stap -l 'process("/usr/lib/libstdc++.so.6").function("operator new")' -vv |& grep pc=
probe operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37 process=/usr/lib/libstdc++.so.6.0.13 reloc=.dynamic pc=0xaf200
probe operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45 process=/usr/lib/libstdc++.so.6.0.13 reloc=.dynamic pc=0xaf150

$ nm /usr/lib/debug/usr/lib/libstdc++.so.6.0.13.debug | c++filt | grep 'operator new('
000af150 T operator new(unsigned int)
000af200 T operator new(unsigned int, std::nothrow_t const&)

So far, so good.  With -DDEBUG_UPROBES, I see:

stap_uprobe_change_plus:67: +uprobe spec 0 idx 0 process stap[2907] addr 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37")
stap_uprobe_change_plus:67: +uprobe spec 1 idx 1 process stap[2907] addr 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45")
stap_uprobe_change_minus:225: -uprobe spec 0 idx 0 process stap[2907] reloc 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37")
stap_uprobe_change_minus:225: -uprobe spec 1 idx 1 process stap[2907] reloc 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45")

The addrs here confirm what I saw in gdb's disassembly, but they're clearly not the right place.  So where would gdb probe?

(gdb) b 'operator new(unsigned int)'
Breakpoint 2 at 0x20b8ca: file ../../../../libstdc++-v3/libsupc++/new_op.cc, line 46.
(gdb) b 'operator new(unsigned int, std::nothrow_t const&)' 
Breakpoint 3 at 0x20b97a: file ../../../../libstdc++-v3/libsupc++/new_opnt.cc, line 38.

So it looks like our probe addresses are missing some offset, as they're both 0x277a off from where they should be.
Comment 1 Frank Ch. Eigler 2010-10-24 03:29:55 UTC
I can't reproduce this on x86-64.
What does -DDEBUG_SYMBOLS=2 -DDEBUG_UPROBES -DDEBUG_TASK_FINDER_VMA offer?
Comment 2 Josh Stone 2010-10-25 16:14:59 UTC
(In reply to comment #1)
> I can't reproduce this on x86-64.

Well, I did say i686 only...

> What does -DDEBUG_SYMBOLS=2 -DDEBUG_UPROBES -DDEBUG_TASK_FINDER_VMA offer?

Let me know if you want the entire dump, but AFAICS here's the relevant excerpt:

> __stp_call_mmap_callbacks:611: pid 1505, a/l/o/p/path 0x15a000  0xed000  0x0  r-xp  /usr/lib/libstdc++.so.6.0.13
> stap_uprobe_mmap_found:274: +mmap R-X pid 1505 path /usr/lib/libstdc++.so.6.0.13 addr 0015a000 length 970752 offset (null) stf e1d7bd88 e1d7bd88 path /usr/lib/libstdc++.so.6.0.13
> stap_uprobe_change_plus:67: +uprobe spec 0 idx 0 process stap[1505] addr 00209200 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_opnt.cc:37")
> stap_uprobe_change_plus:67: +uprobe spec 1 idx 1 process stap[1505] addr 00209150 pp process("/usr/lib/libstdc++.so.6.0.13").function("operator new@../../../../libstdc++-v3/libsupc++/new_op.cc:45")
> __stp_utrace_task_finder_target_syscall_exit:1435: tsk 1505 found mmap2(0x0), returned 0x23b000
> __stp_call_mmap_callbacks:611: pid 1505, a/l/o/p/path 0x23b000  0x6000  0xe0000  rw-p  /usr/lib/libstdc++.so.6.0.13
> stap_uprobe_mmap_found:279: +mmap RW- pid 1505 path /usr/lib/libstdc++.so.6.0.13 addr 0023b000 length 24576 offset 000e0000 stf e1d7bd88 e1d7bd88 path /usr/lib/libstdc++.so.6.0.13
Comment 3 Josh Stone 2010-10-25 21:38:42 UTC
Some additional insight from chatting with fche on IRC...

The 0x277a address difference between stap and gdb includes gdb's prologue searching.  In both cases, this is an offset 0x1a into the function, so the real difference between stap and gdb's notion of function start is 0x2760.

Now for a little numerology:

> $ eu-readelf -S /usr/lib/libstdc++.so.6.0.13 | grep text
> [12] .text                PROGBITS     00b58610 045610 072a88  0 AX     0   0 16
> $ eu-readelf -S /usr/lib/debug/usr/lib/libstdc++.so.6.0.13.debug | grep text
> [12] .text                NOBITS       00042eb0 000160 072a88  0 AX     0   0 16
> $ python -c 'print(hex(0x45610 - 0x42eb0))'
> 0x2760

For comparison, the working x86_64 has no difference in those numbers:

> $ eu-readelf -S /usr/lib64/libstdc++.so.6.0.13 | grep text
> [12] .text                PROGBITS     0000003b706563f0 000563f0 0006d006  0 AX     0   0 16
> $ eu-readelf -S /usr/lib/debug/usr/lib64/libstdc++.so.6.0.13.debug | grep text
> [12] .text                NOBITS       00000000000563f0 00000230 0006d006  0 AX     0   0 16

It appears that prelink is responsible for this shift, although only on i686.  Examining the virgin file from the rpm, or even doing prelink -u, gets the address back to matching what's in the debuginfo:

> $ eu-readelf -S ~/libstdc++-4.4.4-10.fc13.i686/usr/lib/libstdc++.so.6.0.13 | grep text
> [12] .text                PROGBITS     00042eb0 042eb0 072a88  0 AX     0   0 16
Comment 4 Josh Stone 2010-10-27 18:49:28 UTC
Roland explained this effect:

<roland> it is prelink's REL->RELA conversion in DSOs, which moves the real code (.text et al) relative to the start of the mapping

He also said this is probably best fixed in libdwfl.
Comment 5 Roland McGrath 2010-11-13 00:48:01 UTC
elfutils commit 1743d7f should fix this problem.
That elfutils source should get some thorough regression testing too.
Comment 6 Frank Ch. Eigler 2010-11-13 15:38:41 UTC
Thanks, the new elfutils appears to fix this problem, at least on rawhide-32,
using a --with-elfutils bundled style build.  Looking forward to a
full elfutils release.
Comment 7 Roland McGrath 2010-11-23 22:10:22 UTC
elfutils 0.150 is in rawhide and percolating its way through Fedora updates for 13 and 14.