This is the mail archive of the
systemtap@sourceware.org
mailing list for the systemtap project.
Re: systemtap 2.2.1 installcheck => kernel BUG at .. kprobes.c:707
- From: Mark Wielaard <mjw at redhat dot com>
- To: Timo Juhani Lindfors <timo dot lindfors at iki dot fi>
- Cc: systemtap at sourceware dot org
- Date: Fri, 17 May 2013 20:18:35 +0200
- Subject: Re: systemtap 2.2.1 installcheck => kernel BUG at .. kprobes.c:707
- References: <84obc93lq6 dot fsf at sauna dot l dot org> <y0m61yhn6ff dot fsf at fche dot csb> <84ip2h3cz0 dot fsf at sauna dot l dot org> <84bo8932rv dot fsf at sauna dot l dot org>
On Fri, 2013-05-17 at 21:00 +0300, Timo Juhani Lindfors wrote:
> Timo Juhani Lindfors <timo.lindfors@iki.fi> writes:
> > Thanks! After "echo 0 > /proc/sys/debug/kprobes-optimization" the kernel
> > does not crash anymore and the testsuite completes. I see however a few
> > stap segfaults and OOM killer hits.
>
> First segfault:
>
> lindi3:~/tmp/systemtap-2.2.1/testsuite$ gdb --args stap --rlimit-stack=1 --rlimit-stack=999999999999 -p4 ./systemtap.base/rlimit.stp
> GNU gdb (GDB) 7.4.1-debian
> Copyright (C) 2012 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law. Type "show copying"
> and "show warranty" for details.
> This GDB was configured as "x86_64-linux-gnu".
> For bug reporting instructions, please see:
> <http://www.gnu.org/software/gdb/bugs/>...
> Reading symbols from /usr/bin/stap...(no debugging symbols found)...done.
> (gdb) r
> Starting program: /usr/bin/stap --rlimit-stack=1 --rlimit-stack=999999999999 -p4 ./systemtap.base/rlimit.stp
> warning: no loadable sections found in added symbol-file system-supplied DSO at 0x7ffff7ffa000
> warning: Could not load shared library symbols for linux-vdso.so.1.
> Do you need "set solib-search-path" or "set sysroot"?
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
> Unable to set resource limits for rlimit_stack : Operation not permitted
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff7bb17f9 in dwarf_getsrclines () from /usr/lib/x86_64-linux-gnu/libdw.so.1
> (gdb) bt
> #0 0x00007ffff7bb17f9 in dwarf_getsrclines () from /usr/lib/x86_64-linux-gnu/libdw.so.1
> #1 0x00007ffff7bb5ce1 in dwarf_decl_file () from /usr/lib/x86_64-linux-gnu/libdw.so.1
I am not sure what the rlimit references do here. But if it is limiting
the stack a lot, then that might be it. dwarf_getsrclines () uses some
alloca () calls for temporary memory.
> The version of libdw1 is 0.153-2. I rebuilt it with -O0 -g and now I see bit more:
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x00007ffff7ba24ce in add_new_line (new_line=0x7ffffffdd020, end_sequence=false) at dwarf_getsrclines.c:361
> 361 {
> (gdb) bt
> #0 0x00007ffff7ba24ce in add_new_line (new_line=0x7ffffffdd020, end_sequence=false) at dwarf_getsrclines.c:361
> #1 0x00007ffff7ba1144 in dwarf_getsrclines (cudie=0x7fffffffb7b0, lines=0x7fffffffb778, nlines=0x7fffffffb780) at dwarf_getsrclines.c:421
> #2 0x00007ffff7ba7b14 in dwarf_decl_file (die=0x2e5cd78) at dwarf_decl_file.c:87
[...]
> (gdb) l
> 356 end_seq))) \
> 357 goto invalid_data; \
> 358 } while (0)
> 359
> 360 inline bool add_new_line (struct linelist *new_line, bool end_sequence)
> 361 {
> 362 /* Set the line information. For some fields we use bitfields,
> 363 so we would lose information if the encoded values are too large.
> 364 Check just for paranoia, and call the data "invalid" if it
> 365 violates our assumptions on reasonable limits for the values. */
> (gdb) p *new_line
> $2 = {line = {files = 0x0, addr = 0, file = 0, line = 0, column = 0, is_stmt = 0, basic_block = 0, end_sequence = 0, prologue_end = 0, epilogue_begin = 0, op_index = 0, isa = 0, discriminator = 0},
> next = 0x7ffffffdd060}
> (gdb) p *new_line->next
> $3 = {line = {files = 0x0, addr = 18446744071585905164, file = 4, line = 2702, column = 0, is_stmt = 1, basic_block = 0, end_sequence = 0, prologue_end = 0, epilogue_begin = 0, op_index = 0, isa = 0,
> discriminator = 0}, next = 0x7ffffffdd0a0}
> (gdb) up
> #1 0x00007ffff7ba1144 in dwarf_getsrclines (cudie=0x7fffffffb7b0, lines=0x7fffffffb778, nlines=0x7fffffffb780) at dwarf_getsrclines.c:421
> 421 NEW_LINE (0);
> (gdb) l
> 416 /* Perform the increments. */
> 417 line += line_increment;
> 418 advance_pc ((opcode - opcode_base) / line_range);
> 419
> 420 /* Add a new line with the current state machine values. */
> 421 NEW_LINE (0);
> 422
> 423 /* Reset the flags. */
> 424 basic_block = false;
> 425 prologue_end = false;
The NEW_LINE define uses alloca.
Try the same stap command without the -rlimit-stack=... arguments to see
if that is it.
Cheers,
Mark