This bug happened when I was debugging a loadable kernel module. There are many variants of this bug and the earliest commit that causes the kernel module issue is: 2cdbbe44126601596aad7891de05cb7fc6bb21c8. Setup: - arch: x86_64 - qemu + kvm VM - gdb built from master Summary of bug: - Load a LKM - Get the LKM's .text address and use add-symbol-file - A combination of the following happens: - breakpoint is set at 2 locations (gdb/master) http://pastebin.com/4PhDAHwW - breakpoint is set at 1 location, but all subsequent breakpoints lose line number information. Breakpoints from #2 onwards to the same function are set at slightly different offsets. http://pastie.org/2758080 - Setting a breakpoint at the actual function address also loses locals. http://pastie.org/2758181 - Line number/locals info is lost (in all the above) tromey suggested a small .so example and here's one. I am not sure if this is the right behaviour though. ############# app.c: #include <stdio.h> extern int lib_function(int); void f() { printf("f()\n"); } int main() { lib_function(0); // force load f(); // for breakpoint printf("Return code is %i\n",lib_function(32)); return 0; } ############# lib.c: #include <stdio.h> int lib_function(int arg) { int temp = arg + 10; printf("lib_function(%d) called\n", arg); return temp; } ############# compile.sh: gcc -fPIC -c -g lib.c gcc -shared -g -Wl,-soname,libblah.so -o libblah.so lib.o gcc app.c -L . -l blah -g -o app ############# in gdb: http://pastie.org/2758798 Thanks,
> (gdb) add-symbol-file ./libblah.so 0x7ffff7bdb000 > add symbol table from file "./libblah.so" at > .text_addr = 0x7ffff7bdb000 > (y or n) y > # Obtained info from /proc/pid/maps ... I bet the .text section does not start at a page boundary (0x...000). See: readelf -WS ./libblah.so | grep '\.text' You need to add the "Address" field to the base address you see in /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is unprelinked. If you run prelink you moreover need to subtract the prelink address). Moreover GDB already loaded symbols for that ./libblah.so so by another "add-symbol-file" (at a different and incorrect address) you have the symbols twice there, it just cannot work. > - breakpoint is set at 2 locations (gdb/master) > http://pastebin.com/4PhDAHwW I do not think you need to use "-s" option for "add-symbol-file", a single offset should be sufficient, I am not completely sure but I am almost sure the kernel loads all .ko file seguments with the same displacement. And the .text section looks to have wrong address here as in the previous case. Also initially GDB already loaded symbols for "vmlinux" so you should remove them first (for example by "file" itself), otherwise you have the same symbol file loaded twice, at two locations, which may work in the future with Tom Tromey's ambiguous-linespec patches but they are not yet finished / checked in. Please correct the GDB usage first, I do not see any GDB bugs there now.
> > I bet the .text section does not start at a page boundary (0x...000). > See: > readelf -WS ./libblah.so | grep '\.text' > > You need to add the "Address" field to the base address you see in > /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is > unprelinked. If you run prelink you moreover need to subtract the prelink > address). I did not know that; I am sorry... How do you check if the library is prelinked? > > > - breakpoint is set at 2 locations (gdb/master) > > http://pastebin.com/4PhDAHwW > > I do not think you need to use "-s" option for "add-symbol-file", a single > offset should be sufficient, I am not completely sure but I am almost sure the > kernel loads all .ko file seguments with the same displacement. > And the .text section looks to have wrong address here as in the previous case. I did try without specifying the extra segments, but the same problem persists. About the .text section, once I insmod, I check the address of the function via /proc/kallsyms in the guest. > > > Also initially GDB already loaded symbols for "vmlinux" so you should remove > them first (for example by "file" itself), otherwise you have the same symbol > file loaded twice, at two locations, which may work in the future with Tom > Tromey's ambiguous-linespec patches but they are not yet finished / checked in. But, how is that possible? Since the kernel module is loaded dynamically; vmlinux itself does not have any symbols for the function I am setting breakpoint? BTW, the above commands for kernel module work correctly with gdb-7.1. Thanks,
(In reply to comment #2) > > > > I bet the .text section does not start at a page boundary (0x...000). > > See: > > readelf -WS ./libblah.so | grep '\.text' > > > > You need to add the "Address" field to the base address you see in > > /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is > > unprelinked. If you run prelink you moreover need to subtract the prelink > > address). > > I did not know that; I am sorry... > How do you check if the library is prelinked? In `readelf -S' output there is `.gnu.prelink_undo'. Besides that you can see there an address shift: [Nr] Name Type Address Offset Size EntSize Flags Link Info Align [ 0] NULL 0000000000000000 00000000 0000000000000000 0000000000000000 0 0 0 [ 1] .note.gnu.build-i NOTE 0000003f15600270 00000270 0000000000000024 0000000000000000 A 0 0 4 The Address column is shifter by 0x3f15600000 upwards. After `prelink -u' there is just: [ 1] .note.gnu.build-i NOTE 0000000000000270 00000270 0000000000000024 0000000000000000 A 0 0 4 (Be careful with prelink on your vital system libraries.) > I did try without specifying the extra segments, but the same problem > persists. And have you corrected the .text address? Which one you used? > But, how is that possible? Since the kernel module is loaded dynamically; > vmlinux itself does not have any symbols for the function I am setting > breakpoint? I was talking about about vmlinux itself, not about the module. You were adding manually also vmlinux. > BTW, the above commands for kernel module work correctly with gdb-7.1. If you have symbol duplication it works very randomly which symbol gets chosen which time, there could be more luck with this or that version of GDB.
(In reply to comment #3) > The Address column is shifter by 0x3f15600000 upwards. After `prelink -u' > there is just: > > [ 1] .note.gnu.build-i NOTE 0000000000000270 00000270 > 0000000000000024 0000000000000000 A 0 0 4 > > (Be careful with prelink on your vital system libraries.) Thanks for the clarification. On libblah.so, I see this: [ 1] .note.gnu.build-i NOTE 00000000000001c8 000001c8 0000000000000024 0000000000000000 A 0 0 4 Should I be looking at section information for .text? libblah.so: [12] .text PROGBITS 0000000000000540 00000540 0000000000000138 0000000000000000 AX 0 0 16 > > > > I did try without specifying the extra segments, but the same problem > > persists. > > And have you corrected the .text address? Which one you used? > I used the address output by /sys/module/$mod/sections/.text. Here's some additional info: guest# insmod ./openvswitch_mod.ko guest# cat /sys/module/openvswitch_mod/sections/.text 0xffffffffa00ca000 guest# cat /proc/kallsyms | grep dp_process ffffffffa00cd3c8 t dp_process_received_packet [openvswitch_mod] (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00ca000 add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at .text_addr = 0xffffffffa00ca000 (y or n) y Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done. (gdb) info addr dp_process_received_packet Symbol "dp_process_received_packet" is a function at address 0xffffffffa00cd3c8. So, both gdb and guest say that the function dp_process_received_packet is at 0xffffffffa00cd3c8. But once I set a breakpoint: (gdb) break dp_process_received_packet Breakpoint 1 at 0xffffffffa00cd3cc (2 locations) ^^^^^^^^^^^^^^^^^^ ?? (gdb) info br Num Type Disp Enb Address What 1 breakpoint keep y <MULTIPLE> 1.1 y 0xffffffffa00cd3cc <dp_process_received_packet+4> 1.2 y 0xffffffffa00cd3e7 <dp_process_received_packet+31> This is what I have trouble understanding... I understand that there could be prologue skipping, but why are there two breakpoints? Now, if I run and hit the breakpoint: (gdb) c Continuing. Breakpoint 1, 0xffffffffa00cd3cc in dp_process_received_packet () (gdb) info locals No symbol table info available. > > I was talking about about vmlinux itself, not about the module. You were > adding manually also vmlinux. > > > > BTW, the above commands for kernel module work correctly with gdb-7.1. > > If you have symbol duplication it works very randomly which symbol gets chosen > which time, there could be more luck with this or that version of GDB. Got it. :)
(In reply to comment #4) > Should I be looking at section information for .text? Yes. > libblah.so: > [12] .text PROGBITS 0000000000000540 00000540 > 0000000000000138 0000000000000000 AX 0 0 16 Therefore add for any address you find in /proc/PID/maps 0x540 in this case. > guest# cat /sys/module/openvswitch_mod/sections/.text > 0xffffffffa00ca000 Not sure why they call it `.text' but there is only a probability of 1:4095 that this is really a .text address. Most probably you need to add the .text section address above (from readelf -WS ./openvswitch_mod.ko | grep '\.text'). > (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00ca000 > add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at > .text_addr = 0xffffffffa00ca000 With probability 99.98% this is wrong address so the GDB behavior is bogus afterwards. > So, both gdb and guest say that the function dp_process_received_packet is at > 0xffffffffa00cd3c8. I do not understand this part but anyway some of the addresses you specified wrongly. Not going to spend debugging a situation which is already wrong.
(In reply to comment #5) Thanks for the offset explanation. It cleared things up. > > guest# cat /sys/module/openvswitch_mod/sections/.text > > 0xffffffffa00ca000 > > Not sure why they call it `.text' but there is only a probability of 1:4095 > that this is really a .text address. Most probably you need to add the .text > section address above (from readelf -WS ./openvswitch_mod.ko | grep '\.text'). I see this info for .text: Name Type Address Off .text PROGBITS 0000000000000000 000064 Size ES Flg Lk Inf Al 017518 00 AX 0 0 4 So, I did: cat /proc/modules/.../.text 0xffffffffa00f9000 (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00f9064 add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at .text_addr = 0xffffffffa00f9064 ^^ added offset 64 (y or n) y Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done. (gdb) break dp_process_received_packet Breakpoint 1 at 0xffffffffa00fc42c: file /home/nikhilh/openvswitch/datapath/linux/datapath.c, line 262. (2 locations) (gdb) info br Num Type Disp Enb Address What 1 breakpoint keep y <MULTIPLE> 1.1 y 0xffffffffa00fc42c /home/nikhilh/openvswitch/datapath/linux/datapath.c:262 1.2 y 0xffffffffa00fc44b /home/nikhilh/openvswitch/datapath/linux/datapath.c:262 There are still two breakpoints. Which seems weird. If I hit the breakpoint, the line numbers and locals information is still missing. So I tried another approach: guest# cat /proc/kallsyms | grep dp_process ffffffffa00d63c8 t dp_process_received_packet [openvswitch_mod] (gdb) break *0xffffffffa00d63c8 Breakpoint 1 at 0xffffffffa00d63c8 (gdb) c Continuing. Breakpoint 1, 0xffffffffa00d63c8 in dp_detach_port () (Which is a completely different function altogether! Does this suggest that the 0x64 offset could be wrong?) If I do not add the offset 0x64, I see this: (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00bf000 add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at .text_addr = 0xffffffffa00bf000 (y or n) y Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done. (gdb) break *0xffffffffa00c23c8 Breakpoint 1 at 0xffffffffa00c23c8 (gdb) info br Num Type Disp Enb Address What 1 breakpoint keep y 0xffffffffa00c23c8 <dp_process_received_packet> ^^^^^^^^ This seems correct (gdb) c Continuing. Breakpoint 1, 0xffffffffa00c23c8 in dp_process_received_packet () (gdb) info args No symbol table info available. Here is the line number info from the kernel module. # readelf -wil openvswitch_mod.ko <1><3f4e4>: Abbrev Number: 89 (DW_TAG_subprogram) <3f4e5> DW_AT_external : 1 <3f4e6> DW_AT_name : (indirect string, offset: 0x1aeb8): dp_process_received_packet <3f4ea> DW_AT_decl_file : 29 <3f4eb> DW_AT_decl_line : 261 <3f4ed> DW_AT_prototyped : 1 <3f4ee> DW_AT_low_pc : 0x33c8 <3f4f6> DW_AT_high_pc : 0x3690 <3f4fe> DW_AT_frame_base : 0x322c (location list) <3f502> DW_AT_sibling : <0x3f67c> (followed by args/locals information)
(In reply to comment #6) > cat /proc/modules/.../.text > 0xffffffffa00f9000 This is probably right as I see now. > (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00f9064 > add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at > .text_addr = 0xffffffffa00f9064 > ^^ added offset 64 No... that address 0xffffffffa00f9000 was right. Sorry, I do not know Linux kernel, the idea of loading .o files (instead of .so files) was not fortunate. Most of my comments above were probably bogus.
Also see the gdb@ thread: http://sourceware.org/ml/gdb/2011-11/msg00010.html (it starts in the previous month). Maybe this is really https://bugzilla.redhat.com/show_bug.cgi?id=714824#c6
[patch] Fix overlapping objfiles with discontiguous CUs (PR 13346) http://sourceware.org/ml/gdb-patches/2011-11/msg00166.html
CVSROOT: /cvs/src Module name: src Changes by: jkratoch@sourceware.org 2011-12-02 01:28:55 Modified files: gdb : ChangeLog dwarf2read.c psympriv.h psymtab.c gdb/testsuite : ChangeLog Added files: gdb/testsuite/gdb.dwarf2: dw2-objfile-overlap-inner.S dw2-objfile-overlap-outer.S dw2-objfile-overlap.exp Log message: gdb/ PR breakpoints/13346 * dwarf2read.c (process_psymtab_comp_unit): Set PSYMTABS_ADDRMAP_SUPPORTED. * psympriv.h (struct partial_symtab): Comment textlow and texthigh validity. New field psymtabs_addrmap_supported. * psymtab.c (find_pc_sect_psymtab_closer): New gdb_assert on psymtabs_addrmap_supported. (find_pc_sect_psymtab): Do not fallback to TEXTLOW and TEXTHIGH for !PSYMTABS_ADDRMAP_SUPPORTED. (dump_psymtab, maintenance_info_psymtabs): Print also psymtabs_addrmap_supported. gdb/testsuite/ PR breakpoints/13346 * gdb.dwarf2/dw2-objfile-overlap-inner.S: New file. * gdb.dwarf2/dw2-objfile-overlap-outer.S: New file. * gdb.dwarf2/dw2-objfile-overlap.exp: New file. Patches: http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/ChangeLog.diff?cvsroot=src&r1=1.13566&r2=1.13567 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/dwarf2read.c.diff?cvsroot=src&r1=1.582&r2=1.583 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/psympriv.h.diff?cvsroot=src&r1=1.9&r2=1.10 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/psymtab.c.diff?cvsroot=src&r1=1.34&r2=1.35 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/ChangeLog.diff?cvsroot=src&r1=1.2954&r2=1.2955 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap-inner.S.diff?cvsroot=src&r1=NONE&r2=1.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap-outer.S.diff?cvsroot=src&r1=NONE&r2=1.1 http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap.exp.diff?cvsroot=src&r1=NONE&r2=1.1
Checked in.