Bug 5603

Summary: glibc stack-smashing error in staprun on f8 0.6-1 build
Product: systemtap Reporter: Frank Ch. Eigler <fche>
Component: runtimeAssignee: Martin Hunt <hunt>
Status: RESOLVED FIXED    
Severity: normal    
Priority: P1    
Version: unspecified   
Target Milestone: ---   
Host: x86-64 Target:
Build: Last reconfirmed:

Description Frank Ch. Eigler 2008-01-12 15:32:12 UTC
Several errors occur during a "sudo make installcheck" on an fedora8
machine with systemtap-0.6* installed from updates-testing.  The test
suite is that packaged by systemtap-testsuite RPM, so make installcheck
is being run from /usr/share/systemtap/testsuite.  Here's the error as
shown in systemtap.log.  It is triggered by several test cases, including
systemtap.base/kmodule.exp, kfunct.exp.

*** stack smashing detected ***: /usr/libexec/systemtap/stapio terminated
======= Backtrace: =========
/lib64/libc.so.6(__fortify_fail+0x32)[0x3b0c2ea362]
/lib64/libc.so.6(__fortify_fail+0x0)[0x3b0c2ea330]
/usr/libexec/systemtap/stapio[0x406165]
/usr/libexec/systemtap/stapio[0x403403]
/lib64/libpthread.so.0[0x3b0ce06407]
/lib64/libc.so.6(clone+0x6d)[0x3b0c2d4b0d]
======= Memory map: ========
00400000-00409000 r-xp 00000000 fd:00 10649706         /usr/bin/staprun
00608000-00609000 rw-p 00008000 fd:00 10649706         /usr/bin/staprun
.....

From the core dump - associated oddly with the staprun not stapio process,
gdb says:
(gdb) bt
#0  0x0000003b0c230ec5 in raise () from /lib64/libc.so.6
#1  0x0000003b0c232970 in abort () from /lib64/libc.so.6
#2  0x0000003b0c26b0db in __libc_message () from /lib64/libc.so.6
#3  0x0000003b0c2ea362 in __fortify_fail () from /lib64/libc.so.6
#4  0x0000003b0c2ea330 in __stack_chk_fail () from /lib64/libc.so.6
#5  0x0000000000406165 in do_kernel_symbols () at runtime/staprun/symbols.c:305
#6  0x0000000000403403 in handle_symbols (arg=<value optimized out>)
    at runtime/staprun/staprun_funcs.c:431
#7  0x0000003b0ce06407 in start_thread () from /lib64/libpthread.so.0
#8  0x0000003b0c2d4b0d in clone () from /lib64/libc.so.6
(gdb) frame 5
#5  0x0000000000406165 in do_kernel_symbols () at runtime/staprun/symbols.c:305
305     }
(gdb) 

do_kernel_symbols() is indeed a hairy looking function with plenty of raw
pointer manipulation.  It is plausible that the stack-protector/fortify gcc
options have identified a genuine bug, but so far I haven't been able to
trigger it on my own hand-built binaries.

It may help to know that as a consequence of this crash, the stap_XXXX
modules stay loaded in the kernel.  (They can be rmmod'd.)  It is possible
that the bug is triggered when another old systemtap module is already there,
perhaps inflating symbol length limits.

The build log of this binary is here, and gives additional rpm-originated CFLAGS:
http://koji.fedoraproject.org/packages/systemtap/0.6/1.fc8/data/logs/x86_64/build.log
Comment 1 Frank Ch. Eigler 2008-01-12 16:57:42 UTC
Update: am able to reproduce this with CVS systemtap, built with the
same compiler options.  valgrind and gdb don't produce any additional
information with this binary, because the stack-smash-detector runs
at the end of the function.

Yet another data point: this message in the kernel logs may relate.

[451468.657609] Systemtap Error at _stp_sym_read_cmd:301 Supplied buffer too
small. count:8192 len:132
Comment 2 Frank Ch. Eigler 2008-01-12 18:30:22 UTC
Added a "--enable-ssp" configure option to build staprun/stapio
in a similar way as fedora does, which in turn evokes the error
in this bug.  Once this bug is fixed, this flag will be turned
on by default.
Comment 3 Frank Ch. Eigler 2008-01-12 23:05:38 UTC
Partial patch committed.  staprun now tolerates somewhat
longer /proc/kallsyms lines.
Comment 4 Martin Hunt 2008-01-14 14:59:32 UTC
Took a quick look at this.  Yes, the kallsyms parsing code does have a fixed
size buffer of 128 bytes, so if someone exports  a huge function name in a huge
module name, things will get messed up. Ironically, I rewrote this ugly stuff a
while ago when adding unwind data ago but did not check in yet because I saw no
need. 

Your hack works for now because /proc/kallsyms truncates function names to 127
bytes, so the maximum line length is around 200.  

My current code looks like this and has no limits

while ((ret = fscanf(kallsyms, "%llx %c %as [%as", &addr, &type, &name, &mod))>0 
               && dataptr < datamax) {
                if (ret < 3)
                        continue;
                if (ret > 3) {
                        /* ignore modules */
                        free(name);
                        free(mod);
                        continue;
                }

I could check in now if you wish.  The rest of the function is totally rewritten
too and requires I merge in some changes to other files, after stripping out my
uwind code.


Comment 5 Frank Ch. Eigler 2008-01-21 17:46:28 UTC
Please advise whether this bug is intended to be fixed by the new code.
Comment 6 Martin Hunt 2008-01-21 18:01:25 UTC
Fixed a while ago.