Bug 2328 - _dl_debug_state() RT_CONSISTENT called too early
Summary: _dl_debug_state() RT_CONSISTENT called too early
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: dynamic-link (show other bugs)
Version: 2.3.6
: P2 normal
Target Milestone: ---
Assignee: Gary Benson
URL:
Keywords:
Depends on: 12575 14298
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-11 19:22 UTC by John Reiser
Modified: 2016-05-12 15:10 UTC (History)
5 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
call .r_brk just before _dl_init (711 bytes, patch)
2006-09-20 17:38 UTC, John Reiser
Details | Diff
Proposed patch for Fedora (3.52 KB, patch)
2011-08-11 12:11 UTC, Gary Benson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description John Reiser 2006-02-11 19:22:40 UTC
Sometimes relocations for newly-loaded modules have not been performed when
ld.so calls _dl_debug_state with RT_CONSISTENT in .r_state.  This prevents a
debugger user from calling subroutines in newly-loaded modules to diagnose
issues with DT_INIT, etc.

Here is a testcase which shows the problem using gdb.
$ cat my_lib.c
#include <stdio.h>

int
sub1(int x)
{
        printf("sub1 %d\n", x);
}
$ cat my_main.c
#include <dlfcn.h>

int
main()
{
        void *handle = dlopen("./my_lib.so", RTLD_LAZY);
        void (*sub1)(int) = (void (*)(int))dlsym(handle, "sub1");
        sub1(6);
        return 0;
}
$ cat Makefile
CFLAGS= -g -fPIC

bug: my_main.o my_lib.so
        gcc $(CFLAGS) -o my_main my_main.o -ldl
-Wl,--dynamic-linker=/usr/local/glibc/lib/ld-linux.so.2

my_lib.so: my_lib.o
        gcc $(CFLAGS) -o my_lib.so -shared my_lib.o
$ make
cc -g -fPIC   -c -o my_main.o my_main.c
cc -g -fPIC   -c -o my_lib.o my_lib.c
gcc -g -fPIC -o my_main my_main.o -ldl
-Wl,--dynamic-linker=/usr/local/glibc/lib/ld-linux.so.2

$ gdb my_main
GNU gdb Red Hat Linux (6.3.0.0-1.98rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux-gnu"...Using host libthread_db
library "/lib/libthread_db.so.1".

(gdb) set stop-on-solib-events 1   ## sets a breakpoint on _dl_debug_state
(gdb) run
Starting program: /home/jreiser/bug/my_main
Reading symbols from shared object read from target memory...done.
Loaded system supplied DSO at 0x6ed000
Stopped due to shared library event
(gdb) info shared   ## which modules are in memory now?
From        To          Syms Read   Shared Object Library
0x002f77f0  0x0030c04f  Yes         /usr/local/glibc/lib/ld-linux.so.2
(gdb) c
Continuing.
Stopped due to shared library event
(gdb) info shared
From        To          Syms Read   Shared Object Library
0x002f77f0  0x0030c04f  Yes         /usr/local/glibc/lib/ld-linux.so.2
0x00ecac00  0x00ecbaa4  Yes         /usr/local/glibc/lib/libdl.so.2
0x005b25c0  0x0069f578  Yes         /usr/local/glibc/lib/libc.so.6
(gdb) c
Continuing.
Stopped due to shared library event
(gdb) info shared
From        To          Syms Read   Shared Object Library
0x002f77f0  0x0030c04f  Yes         /usr/local/glibc/lib/ld-linux.so.2
0x00ecac00  0x00ecbaa4  Yes         /usr/local/glibc/lib/libdl.so.2
0x005b25c0  0x0069f578  Yes         /usr/local/glibc/lib/libc.so.6
(gdb) c
Continuing.
Stopped due to shared library event
(gdb) info shared
From        To          Syms Read   Shared Object Library
0x002f77f0  0x0030c04f  Yes         /usr/local/glibc/lib/ld-linux.so.2
0x00ecac00  0x00ecbaa4  Yes         /usr/local/glibc/lib/libdl.so.2
0x005b25c0  0x0069f578  Yes         /usr/local/glibc/lib/libc.so.6
0x002d9420  0x002d9554  Yes         ./my_lib.so

  ## Now my_lib.so is loaded, and gdb believes that everything is ready to run.
  ## However, ld-linux has not performed relocations on my_lib.so,
  ## so there will be a SIGSEGV when the user calls sub1 in my_lib.so.

(gdb) print sub1(42)

Program received signal SIGSEGV, Segmentation fault.
0x000003f2 in ?? ()
The program being debugged was signaled while in a function called from GDB.
GDB remains in the frame where the signal was received.
To change this behavior use "set unwindonsignal on"
Evaluation of the expression containing the function (sub1) will be abandoned.
(gdb) x/i $pc
0x3f2:  Cannot access memory at address 0x3f2
(gdb) x/12i sub1
0x2d94ec <sub1>:        push   %ebp
0x2d94ed <sub1+1>:      mov    %esp,%ebp
0x2d94ef <sub1+3>:      push   %ebx
0x2d94f0 <sub1+4>:      sub    $0x14,%esp
0x2d94f3 <sub1+7>:      call   0x2d94e7 <__i686.get_pc_thunk.bx>
0x2d94f8 <sub1+12>:     add    $0x1168,%ebx
0x2d94fe <sub1+18>:     mov    0x8(%ebp),%eax
0x2d9501 <sub1+21>:     mov    %eax,0x4(%esp)
0x2d9505 <sub1+25>:     lea    0xffffef10(%ebx),%eax
0x2d950b <sub1+31>:     mov    %eax,(%esp)
0x2d950e <sub1+34>:     call   0x2d93ec   ## printf@PLT
0x2d9513 <sub1+39>:     add    $0x14,%esp
(gdb) x/i 0x2d93ec   ## printf@PLT
0x2d93ec:       jmp    *0xc(%ebx)
(gdb) x/x 0x2d94f8+0x1168+0xc
0x2da66c:       0x000003f2   ## unrelocated


[An earlier version of this report was entered at
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=179072
The problem was confirmed in "upstream" CVS HEAD as of 2006-02-10;
elf/dl-open.c -r1.126 ]
Comment 1 John Reiser 2006-09-20 17:38:48 UTC
Created attachment 1309 [details]
call .r_brk just before _dl_init

Daniel Jacobowitz (CodeSourcery) requested an explicit separate patch for this
one case.
Comment 2 Roland McGrath 2009-11-19 00:30:54 UTC
I'm not at all sure this change would be right.  Someone might want to test
where the RT_CONSISTENT breakpoint happens in relation to relocation on Solaris.

AFAICT the RT_CONSISTENT state is purely about the state of the r_debug.r_map
list and all those pointers.  It says you can get the list of objects mapped in
core now, that is all.

It might be OK to move RT_CONSISTENT to after relocation, but we should be
cautious about that.  It's been as it is for a very long time.

How best to make GDB and libthread_db startup work right is really another question.
Comment 3 Jan Kratochvil 2009-11-19 09:56:38 UTC
(In reply to comment #2)
> Someone might want to test where the RT_CONSISTENT breakpoint happens in
> relation to relocation on Solaris.

Verified by Paul Pluzhnikov
https://bugzilla.redhat.com/show_bug.cgi?id=179072#c15
I can't find official documentation for RT_CONSISTENT,
but Solaris (from which this interface is copied) AFAICT calls 
_dl_debug_state() (or rather its Solaris equivalent) at (B):
http://src.opensolaris.org/source/xref/onnv/onnv-gate/usr/src/cmd/sgs/rtld/common/setup.c#1028
[ (B) = after relocations and before initializers ]


> AFAICT the RT_CONSISTENT state is purely about the state of the r_debug.r_map
> list and all those pointers.  It says you can get the list of objects mapped
> in core now, that is all.

There is just no easy way for debugger how to catch the later point - after the
relocations got resolved.
Comment 4 Roland McGrath 2009-11-21 00:26:10 UTC
It occurs to me that now that we have STT_GNU_IFUNC, there is DSO code that can
run during the relocation phase (if LD_BIND_NOW).  So perhaps we really should
not break after relocation, but still before.

The debugger can always put a breakpoint on the DSO's initializer function if it
wants.  With init_array this could be a bunch of breakpoints to add, but it's
doable.  Nothing should prevent you inserting breakpoints before relocation,
which is what this is really for.

Having libthread_db be happy at certain points is really a different story and
should not be misconstrued as an authentic driver for the ld.so breakpoint behavior.
Comment 5 John Reiser 2009-11-21 03:10:38 UTC
"Nothing should prevent you inserting breakpoints before relocation, ..." 
Except that in general, being subject to future relocation *does* prevent
inserting breakpoints.  Particularly in the case of RISC with multi-byte
breakpoint opcode (such as SPARC), if the intended location of a breakpoint is
also relocated then it is essential to perform the relocation before inserting
the breakpoint.  Even in the case of CISC with a single-byte breakpoint opcode
(such as x86 and x86_64), the "code" could be directly threaded (a list of
addresses), and inserting a breakpoint [by overwriting more than one byte] would
require changing something that is subject to later relocation.
Comment 6 Ulrich Drepper 2010-04-30 16:03:35 UTC
(In reply to comment #5)
> "Nothing should prevent you inserting breakpoints before relocation, ..." 
> Except that in general, being subject to future relocation *does* prevent
> inserting breakpoints.

The breakpoint could be added into the appropriate code in the dynamic linker
(i.e., the place calling the constructor for the object.  That code is already
relocated.

It is wrong to change the semantics of RT_CONSISTENT.  And I think it is a bad
approach trying to extend this strange interface with the debugger by adding a
RT_RUNNABLE state (I made up the name) for when the relocation happened, too. 
If we need more state information it is better to expose some APIs.  We have
libthread_db, maybe we need librtld_db as well.  If we'd have a specification of
what information of the implementation gdb currently contains and what it still
wants we can design such a library.

I guess I am waiting for gdb people to provide this information.
Comment 7 Jan Kratochvil 2011-08-11 11:38:42 UTC
(In reply to comment #6)
> The breakpoint could be added into the appropriate code in the dynamic linker
> (i.e., the place calling the constructor for the object.  That code is
> already relocated.

Such breakpoint is put there by the SystemTap probes, as implemented by Gary Benson for glibc + gdb.

> It is wrong to change the semantics of RT_CONSISTENT.

This patch does not do so.

> And I think it is a bad approach trying to extend this strange interface

Neither this one.

> If we need more state information it is better to expose some APIs.  We have
> libthread_db, maybe we need librtld_db as well.

These SystemTap probes are such alternative interface but SystemTap probes
were never replied to in the mailing list:
        merging roland/systemtap branch
        http://sourceware.org/ml/libc-alpha/2011-03/msg00001.html

> If we'd have a specification of what information of the implementation gdb
> currently contains and what it still wants we can design such a library.

SystemTap probes work well for this purpose.

> I guess I am waiting for gdb people to provide this information.

This patch contains the needed features.
Comment 8 Gary Benson 2011-08-11 12:11:47 UTC
Created attachment 5895 [details]
Proposed patch for Fedora

To elaborate on Jan's comment above, this is the fix we are proposing for Fedora.  It adds a new function, _dl_debug_notify, which is called wherever _dl_debug_state was previously called, and in a couple of other places.  For backwards compatibility, _dl_debug_notify calls _dl_debug_state whenever it was previously called, so applications using _dl_debug_state will see no change. The new interface is in the form of a number of SystemTap probes in _dl_debug_notify which applications can locate and set breakpoints on if necessary.  The new interface has the following advantages over the existing one:

* Applications using the new interface have much more control over the events they are notified about.  They can set breakpoints or whatever on the probes of interest and ignore probes that are not of interest.

* Applications using the new interface are able to be notified about other events than the ones supported by the existing interface.  Currently there are two new events, for the start and end of object relocation, but the interface is extensible in the sense that new probes can be added for other events of interest.

* Applications using the new interface can handle libraries loaded into namespaces other than the default, eg with dlmopen.  Under the existing interface _dl_debug_state is called but the data structure for the namespace (with the list of loaded libraries) is unavailable.  This allows, for example, for a fix to bug 11839.

In addition to the addition of new probes, the new interface is also extensible in the sense that new arguments may be added to existing probes without breaking compatibility.
Comment 9 Gary Benson 2011-08-16 15:10:13 UTC
(In reply to comment #6)
> If we need more state information it is better to expose some APIs.  We have
> libthread_db, maybe we need librtld_db as well.  If we'd have a specification
> of what information of the implementation gdb currently contains and what it
> still wants we can design such a library.
> 
> I guess I am waiting for gdb people to provide this information.

What gdb needs is not so much more information per se as more (and more finely-grained notifications).  I would say as a minimum a new interface should allow debuggers to do everything that the existing interface does, so the new interface needs to be able to notify the debugger at every point the existing interface does.  Then, to be able to fix this particular issue (which incidentally prevents gdb from being able to debug programs that dlopen libpthread) we need a notification after relocation is complete -- basically in the location John Reiser proposed moving the RT_CONSISTENT notification to.  In the probes-based interface I proposed I also added a notification before relocation occurs, for symmetry with all the other notifications which all have before and after versions.

We also need the ability to be able to enable and disable the various notifications.  In the probes-based interface this is done on the gdb side: each notification has its own address, so debuggers can install breakpoints on only the set of notifications they care about.  Another way to do this would be to have one breakpoint address but calls into glibc to enable/disable the various notifications individually.  I didn't go down that route as it would mean debuggers altering the controlled program more than necessary, but it's not unfeasable.

Also, to allow debugging of programs that use dlmopen the interface needs to provide some way for debuggers to inspect namespaces other than the default.  The existing interface only exposes the initial namespace so any others are effectively invisible.

Finally, it is desirable to minimise the amount of data than needs to flow between the inferior and the debugger, as this has performance issues when debugging applications with large numbers of shared libraries, especially when debugging remotely.  The probes-based interface as it stands does not do this yet, but it could be extended to do so without breaking compatibility.
Comment 10 Gary Benson 2012-06-27 10:03:01 UTC
The tracker for the glibc side of this is bug 14298.
Comment 11 Gary Benson 2013-06-06 08:39:22 UTC
The GDB side of this interface was committed here:
http://cygwin.com/ml/gdb-patches/2013-06/msg00046.html
Comment 12 Gary Benson 2013-06-13 09:51:01 UTC
Fixed with this glibc commit:
http://cygwin.com/ml/libc-alpha/2012-07/msg00557.html

and this GDB commit:
http://cygwin.com/ml/gdb-patches/2013-06/msg00046.html
Comment 13 Jackie Rosen 2014-02-16 19:35:09 UTC Comment hidden (spam)
Comment 14 Elion 2015-11-20 18:29:13 UTC Comment hidden (spam)
Comment 15 Elion 2015-11-20 18:29:53 UTC Comment hidden (spam)