[RFC/ia64-linux] pb with shared libraries when attaching to process

Fri Apr 25 02:06:00 GMT 2008

Hello,

It's something I hadn't noticed before with gdb-6.8 because there
was no visible symptoms on the machine I used to do the testing.
But with another machine, I get the following type of error:

        (gdb) att 31147
        Attaching to program: /taff.a/brobecke/regr/ex/foo, process 31147
        Reading symbols from /lib/tls/libc.so.6.1...done.
        Loaded symbols for /lib/tls/libc.so.6.1
        Reading symbols from /lib/ld-linux-ia64.so.2...done.
        Loaded symbols for /lib/ld-linux-ia64.so.2
        0xa000000000010641 in __kernel_syscall_via_break ()
        (gdb) c
        Continuing.
 !!! -> Can't insert breakpoint for slot numbers greater than 2.

The reason for the error message is that GDB is trying to insert
a shlib breakpoint at an address that is invalid for ia64:

    (gdb) maintenance info break
    Num     Type           Disp Enb Address            What
    -1      shlib events   keep y   0x200000000003b768 <local+168>

(the slot number is encoded in the last 4 bits of the address and
should be either 0, 1, or 2).

With older versions of GDB (we were using gdb-6.6), we used to break
on one of the SOLIB_BREAK_NAMES routines, whichever is found first.
In our case, gdb-6.6 was showing:

    (gdb) maintenance info break
    Num Type           Disp Enb Address            What
    -1  shlib events   keep y   0x200000000001b720 <_dl_debug_state>

I think this is the right location to break, as I wrote a little
C program that does a dlopen, and I could witness the breakpoint
being triggered when I "next" over the dlopen operation (again,
with gdb-6.6):

    (gdb) next
    [...]
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x200000000001acd0
    infrun: BPSTAT_WHAT_CHECK_SHLIBS
    [...]

With gdb-6.8, we now first try r_brk, and only if it doesn't work
do we fallback on the previous method.

This address is stored at DEBUG_BASE + lmo->r_brk_offset. For LP64,
the r_brk_offset is set to 16 bytes, so that's DEBUG_BASE + 16.
As far as I can tell, the DEBUG_BASE value that we compute seems
correct - we obtain it from the DT_DEBUG dynamic tag, and the value
matches the address of the "_r_debug" symbol. So it's not obviously
wrong.

Assuming that the DEBUG_BASE is correct, dumping the memory at
this address shows:

 40a28 <_r_debug>:   0x00000001      0x00000000      0x00040a50     0x20000000
 40a38 <_r_debug+16>:0x0003b768      0x20000000      0x00000000     0x00000000
 40a48 <_r_debug+32>:0x00000000      0x20000000      0x00000000     0x00000000
 40a58:              0x000287a0      0x20000000      0x0000d988     0x60000000
 40a68:              0x00041500      0x20000000      0x00000000     0x00000000
 40a78:              0x00040a50      0x20000000      0x00000000     0x00000000
 40a88:              0x00040ea0      0x20000000      0x00000000     0x00000000
 40a98:              0x0000d988      0x60000000      0x0000da78     0x60000000
 40aa8:              0x0000da68      0x60000000      0x0000d9f8     0x60000000
 40ab8:              0x0000da08      0x60000000      0x0000da18     0x60000000

First it confirms why we've determine that r_brk is at 0x200000000003b768.
It also seems to say that the address of _dl_debug_state isnt' saved
in that memory region.

Note that, even in the case when the r_brk address found does not cause
the error, it is still incorrect, as I dont see it being triggered
when stepping over dlopen().

Not knowing how things work underneath, I am a bit stuck. I tried finding
some documentation on how things are supposed to work, to no avail.
What do you guys think? bug in the loader? Problem in the debugger?

Thanks,
-- 
Joel