Sometimes relocations for newly-loaded modules have not been performed when ld.so calls _dl_debug_state with RT_CONSISTENT in .r_state. This prevents a debugger user from calling subroutines in newly-loaded modules to diagnose issues with DT_INIT, etc. Here is a testcase which shows the problem using gdb. $ cat my_lib.c #include <stdio.h> int sub1(int x) { printf("sub1 %d\n", x); } $ cat my_main.c #include <dlfcn.h> int main() { void *handle = dlopen("./my_lib.so", RTLD_LAZY); void (*sub1)(int) = (void (*)(int))dlsym(handle, "sub1"); sub1(6); return 0; } $ cat Makefile CFLAGS= -g -fPIC bug: my_main.o my_lib.so gcc $(CFLAGS) -o my_main my_main.o -ldl -Wl,--dynamic-linker=/usr/local/glibc/lib/ld-linux.so.2 my_lib.so: my_lib.o gcc $(CFLAGS) -o my_lib.so -shared my_lib.o $ make cc -g -fPIC -c -o my_main.o my_main.c cc -g -fPIC -c -o my_lib.o my_lib.c gcc -g -fPIC -o my_main my_main.o -ldl -Wl,--dynamic-linker=/usr/local/glibc/lib/ld-linux.so.2 $ gdb my_main GNU gdb Red Hat Linux (6.3.0.0-1.98rh) Copyright 2004 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) set stop-on-solib-events 1 ## sets a breakpoint on _dl_debug_state (gdb) run Starting program: /home/jreiser/bug/my_main Reading symbols from shared object read from target memory...done. Loaded system supplied DSO at 0x6ed000 Stopped due to shared library event (gdb) info shared ## which modules are in memory now? From To Syms Read Shared Object Library 0x002f77f0 0x0030c04f Yes /usr/local/glibc/lib/ld-linux.so.2 (gdb) c Continuing. Stopped due to shared library event (gdb) info shared From To Syms Read Shared Object Library 0x002f77f0 0x0030c04f Yes /usr/local/glibc/lib/ld-linux.so.2 0x00ecac00 0x00ecbaa4 Yes /usr/local/glibc/lib/libdl.so.2 0x005b25c0 0x0069f578 Yes /usr/local/glibc/lib/libc.so.6 (gdb) c Continuing. Stopped due to shared library event (gdb) info shared From To Syms Read Shared Object Library 0x002f77f0 0x0030c04f Yes /usr/local/glibc/lib/ld-linux.so.2 0x00ecac00 0x00ecbaa4 Yes /usr/local/glibc/lib/libdl.so.2 0x005b25c0 0x0069f578 Yes /usr/local/glibc/lib/libc.so.6 (gdb) c Continuing. Stopped due to shared library event (gdb) info shared From To Syms Read Shared Object Library 0x002f77f0 0x0030c04f Yes /usr/local/glibc/lib/ld-linux.so.2 0x00ecac00 0x00ecbaa4 Yes /usr/local/glibc/lib/libdl.so.2 0x005b25c0 0x0069f578 Yes /usr/local/glibc/lib/libc.so.6 0x002d9420 0x002d9554 Yes ./my_lib.so ## Now my_lib.so is loaded, and gdb believes that everything is ready to run. ## However, ld-linux has not performed relocations on my_lib.so, ## so there will be a SIGSEGV when the user calls sub1 in my_lib.so. (gdb) print sub1(42) Program received signal SIGSEGV, Segmentation fault. 0x000003f2 in ?? () The program being debugged was signaled while in a function called from GDB. GDB remains in the frame where the signal was received. To change this behavior use "set unwindonsignal on" Evaluation of the expression containing the function (sub1) will be abandoned. (gdb) x/i $pc 0x3f2: Cannot access memory at address 0x3f2 (gdb) x/12i sub1 0x2d94ec <sub1>: push %ebp 0x2d94ed <sub1+1>: mov %esp,%ebp 0x2d94ef <sub1+3>: push %ebx 0x2d94f0 <sub1+4>: sub $0x14,%esp 0x2d94f3 <sub1+7>: call 0x2d94e7 <__i686.get_pc_thunk.bx> 0x2d94f8 <sub1+12>: add $0x1168,%ebx 0x2d94fe <sub1+18>: mov 0x8(%ebp),%eax 0x2d9501 <sub1+21>: mov %eax,0x4(%esp) 0x2d9505 <sub1+25>: lea 0xffffef10(%ebx),%eax 0x2d950b <sub1+31>: mov %eax,(%esp) 0x2d950e <sub1+34>: call 0x2d93ec ## printf@PLT 0x2d9513 <sub1+39>: add $0x14,%esp (gdb) x/i 0x2d93ec ## printf@PLT 0x2d93ec: jmp *0xc(%ebx) (gdb) x/x 0x2d94f8+0x1168+0xc 0x2da66c: 0x000003f2 ## unrelocated [An earlier version of this report was entered at https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179072 The problem was confirmed in "upstream" CVS HEAD as of 2006-02-10; elf/dl-open.c -r1.126 ]
Created attachment 1309 [details] call .r_brk just before _dl_init Daniel Jacobowitz (CodeSourcery) requested an explicit separate patch for this one case.
I'm not at all sure this change would be right. Someone might want to test where the RT_CONSISTENT breakpoint happens in relation to relocation on Solaris. AFAICT the RT_CONSISTENT state is purely about the state of the r_debug.r_map list and all those pointers. It says you can get the list of objects mapped in core now, that is all. It might be OK to move RT_CONSISTENT to after relocation, but we should be cautious about that. It's been as it is for a very long time. How best to make GDB and libthread_db startup work right is really another question.
(In reply to comment #2) > Someone might want to test where the RT_CONSISTENT breakpoint happens in > relation to relocation on Solaris. Verified by Paul Pluzhnikov https://bugzilla.redhat.com/show_bug.cgi?id=179072#c15 I can't find official documentation for RT_CONSISTENT, but Solaris (from which this interface is copied) AFAICT calls _dl_debug_state() (or rather its Solaris equivalent) at (B): http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/sgs/rtld/common/setup.c#1028 [ (B) = after relocations and before initializers ] > AFAICT the RT_CONSISTENT state is purely about the state of the r_debug.r_map > list and all those pointers. It says you can get the list of objects mapped > in core now, that is all. There is just no easy way for debugger how to catch the later point - after the relocations got resolved.
It occurs to me that now that we have STT_GNU_IFUNC, there is DSO code that can run during the relocation phase (if LD_BIND_NOW). So perhaps we really should not break after relocation, but still before. The debugger can always put a breakpoint on the DSO's initializer function if it wants. With init_array this could be a bunch of breakpoints to add, but it's doable. Nothing should prevent you inserting breakpoints before relocation, which is what this is really for. Having libthread_db be happy at certain points is really a different story and should not be misconstrued as an authentic driver for the ld.so breakpoint behavior.
"Nothing should prevent you inserting breakpoints before relocation, ..." Except that in general, being subject to future relocation *does* prevent inserting breakpoints. Particularly in the case of RISC with multi-byte breakpoint opcode (such as SPARC), if the intended location of a breakpoint is also relocated then it is essential to perform the relocation before inserting the breakpoint. Even in the case of CISC with a single-byte breakpoint opcode (such as x86 and x86_64), the "code" could be directly threaded (a list of addresses), and inserting a breakpoint [by overwriting more than one byte] would require changing something that is subject to later relocation.
(In reply to comment #5) > "Nothing should prevent you inserting breakpoints before relocation, ..." > Except that in general, being subject to future relocation *does* prevent > inserting breakpoints. The breakpoint could be added into the appropriate code in the dynamic linker (i.e., the place calling the constructor for the object. That code is already relocated. It is wrong to change the semantics of RT_CONSISTENT. And I think it is a bad approach trying to extend this strange interface with the debugger by adding a RT_RUNNABLE state (I made up the name) for when the relocation happened, too. If we need more state information it is better to expose some APIs. We have libthread_db, maybe we need librtld_db as well. If we'd have a specification of what information of the implementation gdb currently contains and what it still wants we can design such a library. I guess I am waiting for gdb people to provide this information.
(In reply to comment #6) > The breakpoint could be added into the appropriate code in the dynamic linker > (i.e., the place calling the constructor for the object. That code is > already relocated. Such breakpoint is put there by the SystemTap probes, as implemented by Gary Benson for glibc + gdb. > It is wrong to change the semantics of RT_CONSISTENT. This patch does not do so. > And I think it is a bad approach trying to extend this strange interface Neither this one. > If we need more state information it is better to expose some APIs. We have > libthread_db, maybe we need librtld_db as well. These SystemTap probes are such alternative interface but SystemTap probes were never replied to in the mailing list: merging roland/systemtap branch http://sourceware.org/ml/libc-alpha/2011-03/msg00001.html > If we'd have a specification of what information of the implementation gdb > currently contains and what it still wants we can design such a library. SystemTap probes work well for this purpose. > I guess I am waiting for gdb people to provide this information. This patch contains the needed features.
Created attachment 5895 [details] Proposed patch for Fedora To elaborate on Jan's comment above, this is the fix we are proposing for Fedora. It adds a new function, _dl_debug_notify, which is called wherever _dl_debug_state was previously called, and in a couple of other places. For backwards compatibility, _dl_debug_notify calls _dl_debug_state whenever it was previously called, so applications using _dl_debug_state will see no change. The new interface is in the form of a number of SystemTap probes in _dl_debug_notify which applications can locate and set breakpoints on if necessary. The new interface has the following advantages over the existing one: * Applications using the new interface have much more control over the events they are notified about. They can set breakpoints or whatever on the probes of interest and ignore probes that are not of interest. * Applications using the new interface are able to be notified about other events than the ones supported by the existing interface. Currently there are two new events, for the start and end of object relocation, but the interface is extensible in the sense that new probes can be added for other events of interest. * Applications using the new interface can handle libraries loaded into namespaces other than the default, eg with dlmopen. Under the existing interface _dl_debug_state is called but the data structure for the namespace (with the list of loaded libraries) is unavailable. This allows, for example, for a fix to bug 11839. In addition to the addition of new probes, the new interface is also extensible in the sense that new arguments may be added to existing probes without breaking compatibility.
(In reply to comment #6) > If we need more state information it is better to expose some APIs. We have > libthread_db, maybe we need librtld_db as well. If we'd have a specification > of what information of the implementation gdb currently contains and what it > still wants we can design such a library. > > I guess I am waiting for gdb people to provide this information. What gdb needs is not so much more information per se as more (and more finely-grained notifications). I would say as a minimum a new interface should allow debuggers to do everything that the existing interface does, so the new interface needs to be able to notify the debugger at every point the existing interface does. Then, to be able to fix this particular issue (which incidentally prevents gdb from being able to debug programs that dlopen libpthread) we need a notification after relocation is complete -- basically in the location John Reiser proposed moving the RT_CONSISTENT notification to. In the probes-based interface I proposed I also added a notification before relocation occurs, for symmetry with all the other notifications which all have before and after versions. We also need the ability to be able to enable and disable the various notifications. In the probes-based interface this is done on the gdb side: each notification has its own address, so debuggers can install breakpoints on only the set of notifications they care about. Another way to do this would be to have one breakpoint address but calls into glibc to enable/disable the various notifications individually. I didn't go down that route as it would mean debuggers altering the controlled program more than necessary, but it's not unfeasable. Also, to allow debugging of programs that use dlmopen the interface needs to provide some way for debuggers to inspect namespaces other than the default. The existing interface only exposes the initial namespace so any others are effectively invisible. Finally, it is desirable to minimise the amount of data than needs to flow between the inferior and the debugger, as this has performance issues when debugging applications with large numbers of shared libraries, especially when debugging remotely. The probes-based interface as it stands does not do this yet, but it could be extended to do so without breaking compatibility.
The tracker for the glibc side of this is bug 14298.
The GDB side of this interface was committed here: http://cygwin.com/ml/gdb-patches/2013-06/msg00046.html
Fixed with this glibc commit: http://cygwin.com/ml/libc-alpha/2012-07/msg00557.html and this GDB commit: http://cygwin.com/ml/gdb-patches/2013-06/msg00046.html
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
http://www.requea.com/xwiki/bin/download/XWiki/AmandatZohan/1cz1.html
http://www.requea.com/xwiki/bin/download/XWiki/AmandatZohan/3cz1.html