Several errors occur during a "sudo make installcheck" on an fedora8 machine with systemtap-0.6* installed from updates-testing. The test suite is that packaged by systemtap-testsuite RPM, so make installcheck is being run from /usr/share/systemtap/testsuite. Here's the error as shown in systemtap.log. It is triggered by several test cases, including systemtap.base/kmodule.exp, kfunct.exp. *** stack smashing detected ***: /usr/libexec/systemtap/stapio terminated ======= Backtrace: ========= /lib64/libc.so.6(__fortify_fail+0x32)[0x3b0c2ea362] /lib64/libc.so.6(__fortify_fail+0x0)[0x3b0c2ea330] /usr/libexec/systemtap/stapio[0x406165] /usr/libexec/systemtap/stapio[0x403403] /lib64/libpthread.so.0[0x3b0ce06407] /lib64/libc.so.6(clone+0x6d)[0x3b0c2d4b0d] ======= Memory map: ======== 00400000-00409000 r-xp 00000000 fd:00 10649706 /usr/bin/staprun 00608000-00609000 rw-p 00008000 fd:00 10649706 /usr/bin/staprun ..... From the core dump - associated oddly with the staprun not stapio process, gdb says: (gdb) bt #0 0x0000003b0c230ec5 in raise () from /lib64/libc.so.6 #1 0x0000003b0c232970 in abort () from /lib64/libc.so.6 #2 0x0000003b0c26b0db in __libc_message () from /lib64/libc.so.6 #3 0x0000003b0c2ea362 in __fortify_fail () from /lib64/libc.so.6 #4 0x0000003b0c2ea330 in __stack_chk_fail () from /lib64/libc.so.6 #5 0x0000000000406165 in do_kernel_symbols () at runtime/staprun/symbols.c:305 #6 0x0000000000403403 in handle_symbols (arg=<value optimized out>) at runtime/staprun/staprun_funcs.c:431 #7 0x0000003b0ce06407 in start_thread () from /lib64/libpthread.so.0 #8 0x0000003b0c2d4b0d in clone () from /lib64/libc.so.6 (gdb) frame 5 #5 0x0000000000406165 in do_kernel_symbols () at runtime/staprun/symbols.c:305 305 } (gdb) do_kernel_symbols() is indeed a hairy looking function with plenty of raw pointer manipulation. It is plausible that the stack-protector/fortify gcc options have identified a genuine bug, but so far I haven't been able to trigger it on my own hand-built binaries. It may help to know that as a consequence of this crash, the stap_XXXX modules stay loaded in the kernel. (They can be rmmod'd.) It is possible that the bug is triggered when another old systemtap module is already there, perhaps inflating symbol length limits. The build log of this binary is here, and gives additional rpm-originated CFLAGS: http://koji.fedoraproject.org/packages/systemtap/0.6/1.fc8/data/logs/x86_64/build.log
Update: am able to reproduce this with CVS systemtap, built with the same compiler options. valgrind and gdb don't produce any additional information with this binary, because the stack-smash-detector runs at the end of the function. Yet another data point: this message in the kernel logs may relate. [451468.657609] Systemtap Error at _stp_sym_read_cmd:301 Supplied buffer too small. count:8192 len:132
Added a "--enable-ssp" configure option to build staprun/stapio in a similar way as fedora does, which in turn evokes the error in this bug. Once this bug is fixed, this flag will be turned on by default.
Partial patch committed. staprun now tolerates somewhat longer /proc/kallsyms lines.
Took a quick look at this. Yes, the kallsyms parsing code does have a fixed size buffer of 128 bytes, so if someone exports a huge function name in a huge module name, things will get messed up. Ironically, I rewrote this ugly stuff a while ago when adding unwind data ago but did not check in yet because I saw no need. Your hack works for now because /proc/kallsyms truncates function names to 127 bytes, so the maximum line length is around 200. My current code looks like this and has no limits while ((ret = fscanf(kallsyms, "%llx %c %as [%as", &addr, &type, &name, &mod))>0 && dataptr < datamax) { if (ret < 3) continue; if (ret > 3) { /* ignore modules */ free(name); free(mod); continue; } I could check in now if you wish. The rest of the function is totally rewritten too and requires I merge in some changes to other files, after stripping out my uwind code.
Please advise whether this bug is intended to be fixed by the new code.
Fixed a while ago.