This is the mail archive of the glibc-bugs@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[Bug nptl/10184] New: setuid functions can stall pthread exit code


It appears that calling any of the setuid functions from one thread while
another thread is exiting can sometimes cause the exiting thread to get stuck.
The stuck thread is not visible to gdb but does appear in the output of ps. If
another thread is trying to join the stuck thread, it will wait forever.

It can become stuck either waiting for a futex wake or busy looping. In both
cases the location is in thread_start, from pthread_create.c (around line 388 in
the git trunk code). As follows:

      do
	lll_futex_wait (&pd->setxid_futex, 0, LLL_PRIVATE);
      while (pd->cancelhandling & SETXID_BITMASK);

>From our investigation, it happens with a tgkill from setxid_signal_thread
fails. This causes the SETDIX_BIT to be set in the target thread, but no signal
is sent and no other thread is actually waiting on it. It seems like a naive fix
would be to change the last statement in that function from:

  if (!INTERNAL_SYSCALL_ERROR_P (val, err))
    atomic_increment (&cmdp->cntr);

To:

  if (INTERNAL_SYSCALL_ERROR_P (val, err))
    t->cancelhandling &= ~SETXID_BITMASK;
  else
    atomic_increment (&cmdp->cntr);

This change appears to correct the problem on our machines.

On our machines (Ubuntu 9.04 with 2 or 4 way SMP) the following program
replicates the problem within a few minutes (just run it and watch for the
output to change to a continuous stream of the letter u):

#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>

void *noop(void *arg) {
        usleep(rand() % 10000);
        return 0;
}

void *spawner(void *arg) {
        pthread_t t;

        for (;;) {
                fprintf(stderr, "c");
                pthread_create(&t, 0, noop, 0);
                fprintf(stderr, "j");
                pthread_join(t, 0);
        }
}

int main() {
        pthread_t spawner_id;

        pthread_create(&spawner_id, 0, spawner, 0);
        for(;;) {
                fprintf(stderr, "u");
                setuid(getuid());
                usleep(10000);
        }
        return 0;
}

We've run the test code on several RedHat machines, with the bug happening on
machines with glibc-2.3.4-2.41, glibc-2.5-18.el5_1.1 or glibc-2.5-24. For some
reason it takes several minutes to fail on 8-way SMP machines.

-- 
           Summary: setuid functions can stall pthread exit code
           Product: glibc
           Version: 2.9
            Status: NEW
          Severity: normal
          Priority: P2
         Component: nptl
        AssignedTo: drepper at redhat dot com
        ReportedBy: samandbernie at guarana dot org
                CC: glibc-bugs at sources dot redhat dot com


http://sourceware.org/bugzilla/show_bug.cgi?id=10184

------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]