This is the mail archive of the libc-help@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

A possible libc/dlmopen/pthreads bug


Hello -

As I've posted here before, I'm working on a segregated-dynamic-linking
mechanism based on dlmopen (to allow applications to link against libraries
without sharing their dependencies).

While testing some recent builds, I think I may have tripped over a
possible dlmopen/pthreads bug. It is entirely possible that I'm doing
something wrong, or failing to do something - a lot of this is new
territory for me.

Either way, I'd appreciate some feedback/insight into what's going on.

A bit of background before I proceed: libcapsule is the helper library
I'm working on, based on dlmopen, that allows me to create "shim"
libraries that expose the symbols from their immediate target,
but don't expose any further dependencies.

  +-----------------------------+ +----------------------------+
  | Runtime filesystem          | | Host filesystem            |
  |                             | |                            |
  | +------------+              | |                            |
  | | Executable |              | |                            |
  | ++------+----+              | |                            |
  |  |      |                   | |                            |
  |  |   +--+---------------+   | |   +------------------+     |
  |  |   | shim libX11      | <-----> | real libX11      |-+   |
  |  |   +--+---------------+   | |   +----------+-------+ |   |
  |  |      |                   | |              |         |   |
  |  |      |                   | |          +---+-------+ |   |
  | ++------+-----+             | |          | libA      | |   |
  | | libc        |             | |          +-----------+ |   |
  | +-------------+             | |          +-----------+ |   |
  |                             | |          | libB      |-+   |
  |                             | |          +-----------+     |
  +-----------------------------+ +----------------------------+

The libraries on the right hand side from the host filesystem are
in a dlmopen namespace of their own (as an implementation detail
they have the same libc as the DSOs on the left, but it is a new
copy opened with dlmopen)

The shim libX11 takes care of making sure the executable gets symbols
from the real libX11 instead of the shim, but the bug we're going
to look is not (I think) related to that, so I'm not going to discuss
that process here.

The visible symptom is that when I launch pulseaudio with a few shimmed
libraries (the first case I stumbled upon, and easy to reproduce) it seems
to deadlock very early in its life.

A bit of digging with strace and gdb shows that when it locks up
it does so inside setresuid. A bit more digging indicates that the
code is infinite looping here:

__nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
+list
1103 1104 /* Now the list with threads using user-allocated stacks. */
1105      list_for_each (runp, &__stack_user)
1106        {
1107          struct pthread *t = list_entry (runp, struct pthread, list);
1108          if (t == self)
1109            continue;
1110 1111 setxid_mark_thread (cmdp, t);
1112        }

For some reason, list_for_each never terminates.

If I disable encapsulation (remove the shim libraries from the library
path) then the following holds at that point in the code:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
1105	  list_for_each (runp, &__stack_user)
+bt
#0  __nptl_setxid (cmdp=0xffffd9e8) at allocatestack.c:1105
#1  0xf7b96162 in __GI___setresuid (ruid=1000, euid=1000, suid=1000)
      at ../sysdeps/unix/sysv/linux/i386/setresuid.c:29
#2  0x5655b7f0 in pa_drop_root ()
#3  0x56558a6e in main ()

Digging into __stack_user:

+p __stack_user
$1 = {next = 0xf73a48a0, prev = 0xf73a48a0}

+p &__stack_user
$2 = (list_t *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf73a48a0

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf7d1d1a4 <__stack_user>

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf73a48a0

We find a circular linked list, which contains a pointer to __stack_user.
Since list_for_each is invoked as list_for_each(…, &__stack_user),
this means the for loop it implements will terminate, allowing setresuid
to proceed.

// ============================================================================
Note: The definition of list_for_each is this:

# define list_for_each(pos, head) \
  for (pos = (head)->next; pos != (head); pos = pos->next)
// ============================================================================

Now let's examine the same case with the shim in place:

Breakpoint 6, __nptl_setxid (cmdp=0xffffd9d8) at allocatestack.c:1105
1105	  list_for_each (runp, &__stack_user)
 ⋮
+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p &__stack_user
$2 = (list_t *) 0xf7d8f1a4 <__stack_user>

+p (&__stack_user)->next
$3 = (struct list_head *) 0xf76eeb60

+p (&__stack_user)->next->next
$4 = (struct list_head *) 0xf71391a4

+p (&__stack_user)->next->next->next
$5 = (struct list_head *) 0xf76eeb60

We can see we have a circular linked list, as before, but it does
_not_ contain the element supplied as the head to list_for_each:
We're going to loop forever.

============================================================================

Next let's try and figure out when/where this happens.
Setting various breakpoints and watches we uncover the following:

+run
Starting program: /usr/bin/pulseaudio --start
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/i386-linux-gnu/libthread_db.so.1".

Breakpoint 1, __pthread_initialize_minimal_internal () at nptl-init.c:290
290	{
+break allocatestack.c:1105
Breakpoint 6 at 0xf7d78b2c: file allocatestack.c, line 1105.
+watch __stack_user
Hardware watchpoint 7: __stack_user
+watch __stack_user.next
Hardware watchpoint 8: __stack_user.next
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0x0, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0x0
New value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
__pthread_initialize_minimal_internal () at nptl-init.c:377
377	  list_add (&pd->list, &__stack_user);
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0x0}
New value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
list_add (head=<optimized out>, newp=0xf76eeb60) at ../include/list.h:64
64	  head->next = newp;
+cont
Continuing.

Hardware watchpoint 7: __stack_user

Old value = {next = 0xf7d8f1a4 <__stack_user>, prev = 0xf76eeb60}
New value = {next = 0xf76eeb60, prev = 0xf76eeb60}

Hardware watchpoint 8: __stack_user.next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf76eeb60
__pthread_initialize_minimal_internal () at nptl-init.c:381
381	  THREAD_SETMEM (pd, report_events, __nptl_initial_report_events);
+cont
Continuing.

Breakpoint 2, __pthread_init_static_tls (map=0x5657e040) at allocatestack.c:1210
1210	{

// ============================================================================
// At this point we step to the end of __pthread_init_static_tls and set
// an extra watch point on the address currently holding &__stack_user
// ============================================================================

+p __stack_user.next
$1 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$2 = (struct list_head *) 0xf7d8f1a4 <__stack_user>  ← STILL GOOD

+watch __stack_user.next->next
Hardware watchpoint 9: __stack_user.next->next
+s


// And here it is: Hardware watchpoint 9: __stack_user.next->next

Old value = (struct list_head *) 0xf7d8f1a4 <__stack_user>
New value = (struct list_head *) 0xf71391a4 ← >>>>> GONE WRONG HERE <<<<<
0xf7121c83 in ?? ()

// Hm, an unknown address scribbling on __stack_user.

+call calloc(1, sizeof(Dl_info))
$3 = (void *) 0x56574d18
+call dladdr(0xf7121c83, $3)
$4 = 1

+p *(Dl_info *)$3
$5 = {dli_fname = 0x565755b8 "/lib/i386-linux-gnu/libpthread.so.0",
      dli_fbase = 0xf711d000,
      dli_sname = 0xf711f617 "__pthread_initialize_minimal",
      dli_saddr = 0xf7121be0}

// Well that can't be right, can it? gdb should have figured out the name
// of 0xf7121c83, not said ?? - let's work out the address in the other
// direction:

+p __pthread_initialize_minimal
$6 = {<text variable, no debug info>} 0xf7d77be0 <__pthread_initialize_minimal_internal>

+call dladdr(0xf7d77be0, $3)
$8 = 1

+p *(Dl_info *)$3
$10 = {dli_fname = 0xf7fd4d70 "/lib/i386-linux-gnu/libpthread.so.0",
       dli_fbase = 0xf7d73000,
       dli_sname = 0xf7d75617 "__pthread_initialize_minimal",
       dli_saddr = 0xf7d77be0 <__pthread_initialize_minimal_internal>}

// ============================================================================

Aha! Same DSO, different base address. So the ?? instance of
__pthread_initialize_minimal_internal was from the _other_ copy of libc,
inside the dlmopen namespace - the one gdb doesn't know how to inspect.


PS: for completeness, I went back and followed the __stack_user linked list
at the "GONE WRONG HERE" point, just to be sure:

+p __stack_user
$1 = {next = 0xf76eeb60, prev = 0xf76eeb60}

+p __stack_user.next
$2 = (struct list_head *) 0xf76eeb60

+p __stack_user.next->next
$3 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next
$4 = (struct list_head *) 0xf71391a4

+p __stack_user.next->next->next->next
$5 = (struct list_head *) 0xf71391a4

So the linked list definitely doesn't contain &__stack_user any more.

// ============================================================================

Apologies for the exegesis: It seems to me that the copy of libc in the
private namespace has somehow managed to scribble on the linked list
pointed to by __stack_user, overwriting a key address.

Is my analysis correct? Is there something I could or should have done to
avoid this?

A while ago (https://sourceware.org/ml/libc-help/2018-01/msg00002.html)
I suggested a dlmopen flag RTLD_UNIQUE or similar which would cause the
existing mapping of the target library in the main namespace/link-map to be
re-used instead of creating a new one: I believe this would prevent this
problem (and others detailed in that message) from occurring - any thoughts?

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]