Bug 13346 - Multiple breakpoints/losing symbol table issue
Summary: Multiple breakpoints/losing symbol table issue
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: breakpoints (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Jan Kratochvil
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-10-25 21:55 UTC by Vimal
Modified: 2011-12-02 01:34 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed: 2011-11-08 00:00:00


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Vimal 2011-10-25 21:55:27 UTC
This bug happened when I was debugging a loadable kernel module.  There are many variants of this bug and the earliest commit that causes the kernel module issue is: 2cdbbe44126601596aad7891de05cb7fc6bb21c8.

Setup:
- arch: x86_64
- qemu + kvm VM
- gdb built from master

Summary of bug:
- Load a LKM
- Get the LKM's .text address and use add-symbol-file
- A combination of the following happens:
  - breakpoint is set at 2 locations (gdb/master)
    http://pastebin.com/4PhDAHwW

  - breakpoint is set at 1 location, but all subsequent
    breakpoints lose line number information.  Breakpoints from #2
    onwards to the same function are set at slightly different
    offsets.
    http://pastie.org/2758080

  - Setting a breakpoint at the actual function address also
    loses locals.
    http://pastie.org/2758181

  - Line number/locals info is lost
    (in all the above)


tromey suggested a small .so example and here's one.  I am not sure if this is the right behaviour though.

############# app.c:
#include <stdio.h>

extern int lib_function(int);

void f() {
    printf("f()\n");
}

int main()
{
    lib_function(0); // force load
    f();             // for breakpoint
    printf("Return code is %i\n",lib_function(32));
    return 0;
}

############# lib.c:
#include <stdio.h>

int lib_function(int arg) {
    int temp = arg + 10;
    printf("lib_function(%d) called\n", arg);
    return temp;
}

############# compile.sh:
gcc -fPIC -c -g lib.c
gcc -shared -g -Wl,-soname,libblah.so -o libblah.so  lib.o

gcc app.c -L . -l blah -g -o app

############# in gdb:
http://pastie.org/2758798


Thanks,
Comment 1 Jan Kratochvil 2011-10-26 07:40:44 UTC
> (gdb) add-symbol-file ./libblah.so 0x7ffff7bdb000
> add symbol table from file "./libblah.so" at
>         .text_addr = 0x7ffff7bdb000
> (y or n) y
> # Obtained info from /proc/pid/maps ...

I bet the .text section does not start at a page boundary (0x...000).
See:
readelf -WS ./libblah.so | grep '\.text'

You need to add the "Address" field to the base address you see in /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is unprelinked.  If you run prelink you moreover need to subtract the prelink address).

Moreover GDB already loaded symbols for that ./libblah.so so by another "add-symbol-file" (at a different and incorrect address) you have the symbols twice there, it just cannot work.


>   - breakpoint is set at 2 locations (gdb/master)
>     http://pastebin.com/4PhDAHwW

I do not think you need to use "-s" option for "add-symbol-file", a single offset should be sufficient, I am not completely sure but I am almost sure the kernel loads all .ko file seguments with the same displacement.
And the .text section looks to have wrong address here as in the previous case.


Also initially GDB already loaded symbols for "vmlinux" so you should remove them first (for example by "file" itself), otherwise you have the same symbol file loaded twice, at two locations, which may work in the future with Tom Tromey's ambiguous-linespec patches but they are not yet finished / checked in.


Please correct the GDB usage first, I do not see any GDB bugs there now.
Comment 2 Vimal 2011-10-26 14:35:44 UTC
> 
> I bet the .text section does not start at a page boundary (0x...000).
> See:
> readelf -WS ./libblah.so | grep '\.text'
> 
> You need to add the "Address" field to the base address you see in
> /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is
> unprelinked.  If you run prelink you moreover need to subtract the prelink
> address).

I did not know that; I am sorry...   How do you check if the library is prelinked?

> 
> >   - breakpoint is set at 2 locations (gdb/master)
> >     http://pastebin.com/4PhDAHwW
> 
> I do not think you need to use "-s" option for "add-symbol-file", a single
> offset should be sufficient, I am not completely sure but I am almost sure the
> kernel loads all .ko file seguments with the same displacement.
> And the .text section looks to have wrong address here as in the previous case.

I did try without specifying the extra segments, but the same problem persists.   About the .text section, once I insmod, I check the address of the function via /proc/kallsyms in the guest.


> 
> 
> Also initially GDB already loaded symbols for "vmlinux" so you should remove
> them first (for example by "file" itself), otherwise you have the same symbol
> file loaded twice, at two locations, which may work in the future with Tom
> Tromey's ambiguous-linespec patches but they are not yet finished / checked in.

But, how is that possible?  Since the kernel module is loaded dynamically; vmlinux itself does not have any symbols for the function I am setting breakpoint?

BTW, the above commands for kernel module work correctly with gdb-7.1.

Thanks,
Comment 3 Jan Kratochvil 2011-10-26 14:49:08 UTC
(In reply to comment #2)
> > 
> > I bet the .text section does not start at a page boundary (0x...000).
> > See:
> > readelf -WS ./libblah.so | grep '\.text'
> > 
> > You need to add the "Address" field to the base address you see in
> > /proc/pid/maps as the ".text_addr" (when the library starts at 0 - it is
> > unprelinked.  If you run prelink you moreover need to subtract the prelink
> > address).
> 
> I did not know that; I am sorry...   
> How do you check if the library is prelinked?

In `readelf -S' output there is `.gnu.prelink_undo'.  Besides that you can see there an address shift:

  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 0]                   NULL             0000000000000000  00000000
       0000000000000000  0000000000000000           0     0     0
  [ 1] .note.gnu.build-i NOTE             0000003f15600270  00000270
       0000000000000024  0000000000000000   A       0     0     4

The Address column is shifter by 0x3f15600000 upwards.  After `prelink -u' there is just:

  [ 1] .note.gnu.build-i NOTE             0000000000000270  00000270
       0000000000000024  0000000000000000   A       0     0     4

(Be careful with prelink on your vital system libraries.)


> I did try without specifying the extra segments, but the same problem
> persists.

And have you corrected the .text address?  Which one you used?


> But, how is that possible?  Since the kernel module is loaded dynamically;
> vmlinux itself does not have any symbols for the function I am setting
> breakpoint?

I was talking about about vmlinux itself, not about the module.  You were adding manually also vmlinux.


> BTW, the above commands for kernel module work correctly with gdb-7.1.

If you have symbol duplication it works very randomly which symbol gets chosen which time, there could be more luck with this or that version of GDB.
Comment 4 Vimal 2011-10-26 15:05:47 UTC
(In reply to comment #3)
> The Address column is shifter by 0x3f15600000 upwards.  After `prelink -u'
> there is just:
> 
>   [ 1] .note.gnu.build-i NOTE             0000000000000270  00000270
>        0000000000000024  0000000000000000   A       0     0     4
> 
> (Be careful with prelink on your vital system libraries.)


Thanks for the clarification.   On libblah.so, I see this:

  [ 1] .note.gnu.build-i NOTE             00000000000001c8  000001c8
       0000000000000024  0000000000000000   A       0     0     4

Should I be looking at section information for .text?

libblah.so:
  [12] .text             PROGBITS         0000000000000540  00000540
       0000000000000138  0000000000000000  AX       0     0     16

> 
> 
> > I did try without specifying the extra segments, but the same problem
> > persists.
> 
> And have you corrected the .text address?  Which one you used?
> 

I used the address output by /sys/module/$mod/sections/.text.   Here's some additional info:

guest# insmod ./openvswitch_mod.ko

guest# cat /sys/module/openvswitch_mod/sections/.text
0xffffffffa00ca000

guest# cat /proc/kallsyms | grep dp_process
ffffffffa00cd3c8 t dp_process_received_packet   [openvswitch_mod]

(gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00ca000
add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at
        .text_addr = 0xffffffffa00ca000
(y or n) y
Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done.
(gdb) info addr dp_process_received_packet
Symbol "dp_process_received_packet" is a function at address 0xffffffffa00cd3c8.

So, both gdb and guest say that the function dp_process_received_packet is at 0xffffffffa00cd3c8.

But once I set a breakpoint:

(gdb) break dp_process_received_packet 
Breakpoint 1 at 0xffffffffa00cd3cc (2 locations)
                ^^^^^^^^^^^^^^^^^^ ??

(gdb) info br
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>         
1.1                         y     0xffffffffa00cd3cc <dp_process_received_packet+4>
1.2                         y     0xffffffffa00cd3e7 <dp_process_received_packet+31>

This is what I have trouble understanding...  I understand that there could be prologue skipping, but why are there two breakpoints?

Now, if I run and hit the breakpoint:
(gdb) c
Continuing.

Breakpoint 1, 0xffffffffa00cd3cc in dp_process_received_packet ()
(gdb) info locals
No symbol table info available.


> 
> I was talking about about vmlinux itself, not about the module.  You were
> adding manually also vmlinux.
> 
> 
> > BTW, the above commands for kernel module work correctly with gdb-7.1.
> 
> If you have symbol duplication it works very randomly which symbol gets chosen
> which time, there could be more luck with this or that version of GDB.

Got it. :)
Comment 5 Jan Kratochvil 2011-10-26 15:12:28 UTC
(In reply to comment #4)
> Should I be looking at section information for .text?

Yes.


> libblah.so:
>   [12] .text             PROGBITS         0000000000000540  00000540
>        0000000000000138  0000000000000000  AX       0     0     16

Therefore add for any address you find in /proc/PID/maps 0x540 in this case.


> guest# cat /sys/module/openvswitch_mod/sections/.text
> 0xffffffffa00ca000

Not sure why they call it `.text' but there is only a probability of 1:4095 that this is really a .text address.  Most probably you need to add the .text section address above (from readelf -WS ./openvswitch_mod.ko | grep '\.text').


> (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00ca000
> add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at
>         .text_addr = 0xffffffffa00ca000

With probability 99.98% this is wrong address so the GDB behavior is bogus afterwards.


> So, both gdb and guest say that the function dp_process_received_packet is at
> 0xffffffffa00cd3c8.

I do not understand this part but anyway some of the addresses you specified wrongly.  Not going to spend debugging a situation which is already wrong.
Comment 6 Vimal 2011-10-26 15:50:27 UTC
(In reply to comment #5)

Thanks for the offset explanation.  It cleared things up.

> > guest# cat /sys/module/openvswitch_mod/sections/.text
> > 0xffffffffa00ca000
> 
> Not sure why they call it `.text' but there is only a probability of 1:4095
> that this is really a .text address.  Most probably you need to add the .text
> section address above (from readelf -WS ./openvswitch_mod.ko | grep '\.text').

I see this info for .text:

    Name              Type            Address          Off
    .text             PROGBITS        0000000000000000 000064

    Size   ES Flg Lk Inf Al
    017518 00  AX  0   0  4

So, I did:

cat /proc/modules/.../.text
0xffffffffa00f9000

(gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00f9064
add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at
        .text_addr = 0xffffffffa00f9064
                                     ^^ added offset 64

(y or n) y
Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done.
(gdb) break dp_process_received_packet
Breakpoint 1 at 0xffffffffa00fc42c: file /home/nikhilh/openvswitch/datapath/linux/datapath.c, line 262. (2 locations)


(gdb) info br
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   <MULTIPLE>
1.1                         y     0xffffffffa00fc42c /home/nikhilh/openvswitch/datapath/linux/datapath.c:262
1.2                         y     0xffffffffa00fc44b /home/nikhilh/openvswitch/datapath/linux/datapath.c:262

There are still two breakpoints.   Which seems weird.

If I hit the breakpoint, the line numbers and locals information is still missing.

So I tried another approach:

guest# cat /proc/kallsyms | grep dp_process
ffffffffa00d63c8 t dp_process_received_packet   [openvswitch_mod]

(gdb) break *0xffffffffa00d63c8
Breakpoint 1 at 0xffffffffa00d63c8

(gdb) c
Continuing.

Breakpoint 1, 0xffffffffa00d63c8 in dp_detach_port ()
(Which is a completely different function altogether!  Does this suggest that the 0x64 offset could be wrong?)


If I do not add the offset 0x64, I see this:
(gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00bf000
add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at
        .text_addr = 0xffffffffa00bf000
(y or n) y
Reading symbols from /home/jvimal/vm/openvswitch_mod.ko...done.


(gdb) break *0xffffffffa00c23c8
Breakpoint 1 at 0xffffffffa00c23c8
(gdb) info br
Num     Type           Disp Enb Address            What
1       breakpoint     keep y   0xffffffffa00c23c8 <dp_process_received_packet>
                                                    ^^^^^^^^ This seems correct

(gdb) c
Continuing.

Breakpoint 1, 0xffffffffa00c23c8 in dp_process_received_packet ()
(gdb) info args
No symbol table info available.

Here is the line number info from the kernel module.

# readelf -wil openvswitch_mod.ko

 <1><3f4e4>: Abbrev Number: 89 (DW_TAG_subprogram)
    <3f4e5>   DW_AT_external    : 1
    <3f4e6>   DW_AT_name        : (indirect string, offset: 0x1aeb8): dp_process_received_packet
    <3f4ea>   DW_AT_decl_file   : 29
    <3f4eb>   DW_AT_decl_line   : 261
    <3f4ed>   DW_AT_prototyped  : 1
    <3f4ee>   DW_AT_low_pc      : 0x33c8
    <3f4f6>   DW_AT_high_pc     : 0x3690
    <3f4fe>   DW_AT_frame_base  : 0x322c        (location list)
    <3f502>   DW_AT_sibling     : <0x3f67c>

  (followed by args/locals information)
Comment 7 Jan Kratochvil 2011-10-26 16:30:07 UTC
(In reply to comment #6)
> cat /proc/modules/.../.text
> 0xffffffffa00f9000

This is probably right as I see now.


> (gdb) add-symbol-file ~/vm/openvswitch_mod.ko 0xffffffffa00f9064
> add symbol table from file "/home/jvimal/vm/openvswitch_mod.ko" at
>         .text_addr = 0xffffffffa00f9064
>                                      ^^ added offset 64

No... that address 0xffffffffa00f9000 was right.


Sorry, I do not know Linux kernel, the idea of loading .o files (instead of .so files) was not fortunate.  Most of my comments above were probably bogus.
Comment 8 Tom Tromey 2011-11-02 13:38:35 UTC
Also see the gdb@ thread:
http://sourceware.org/ml/gdb/2011-11/msg00010.html
(it starts in the previous month).

Maybe this is really
https://bugzilla.redhat.com/show_bug.cgi?id=714824#c6
Comment 9 Jan Kratochvil 2011-11-08 00:33:52 UTC
[patch] Fix overlapping objfiles with discontiguous CUs (PR 13346)
http://sourceware.org/ml/gdb-patches/2011-11/msg00166.html
Comment 10 cvs-commit@gcc.gnu.org 2011-12-02 01:28:59 UTC
CVSROOT:	/cvs/src
Module name:	src
Changes by:	jkratoch@sourceware.org	2011-12-02 01:28:55

Modified files:
	gdb            : ChangeLog dwarf2read.c psympriv.h psymtab.c 
	gdb/testsuite  : ChangeLog 
Added files:
	gdb/testsuite/gdb.dwarf2: dw2-objfile-overlap-inner.S 
	                          dw2-objfile-overlap-outer.S 
	                          dw2-objfile-overlap.exp 

Log message:
	gdb/
	PR breakpoints/13346
	* dwarf2read.c (process_psymtab_comp_unit): Set
	PSYMTABS_ADDRMAP_SUPPORTED.
	* psympriv.h (struct partial_symtab): Comment textlow and texthigh
	validity.  New field psymtabs_addrmap_supported.
	* psymtab.c (find_pc_sect_psymtab_closer): New gdb_assert on
	psymtabs_addrmap_supported.
	(find_pc_sect_psymtab): Do not fallback to TEXTLOW and TEXTHIGH for
	!PSYMTABS_ADDRMAP_SUPPORTED.
	(dump_psymtab, maintenance_info_psymtabs): Print also
	psymtabs_addrmap_supported.
	
	gdb/testsuite/
	PR breakpoints/13346
	* gdb.dwarf2/dw2-objfile-overlap-inner.S: New file.
	* gdb.dwarf2/dw2-objfile-overlap-outer.S: New file.
	* gdb.dwarf2/dw2-objfile-overlap.exp: New file.

Patches:
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/ChangeLog.diff?cvsroot=src&r1=1.13566&r2=1.13567
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/dwarf2read.c.diff?cvsroot=src&r1=1.582&r2=1.583
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/psympriv.h.diff?cvsroot=src&r1=1.9&r2=1.10
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/psymtab.c.diff?cvsroot=src&r1=1.34&r2=1.35
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/ChangeLog.diff?cvsroot=src&r1=1.2954&r2=1.2955
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap-inner.S.diff?cvsroot=src&r1=NONE&r2=1.1
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap-outer.S.diff?cvsroot=src&r1=NONE&r2=1.1
http://sourceware.org/cgi-bin/cvsweb.cgi/src/gdb/testsuite/gdb.dwarf2/dw2-objfile-overlap.exp.diff?cvsroot=src&r1=NONE&r2=1.1
Comment 11 Jan Kratochvil 2011-12-02 01:34:29 UTC
Checked in.