This is the mail archive of the
glibc-bugs@sourceware.org
mailing list for the glibc project.
[Bug nptl/10184] New: setuid functions can stall pthread exit code
- From: "samandbernie at guarana dot org" <sourceware-bugzilla at sourceware dot org>
- To: glibc-bugs at sources dot redhat dot com
- Date: 22 May 2009 06:27:11 -0000
- Subject: [Bug nptl/10184] New: setuid functions can stall pthread exit code
- Reply-to: sourceware-bugzilla at sourceware dot org
It appears that calling any of the setuid functions from one thread while
another thread is exiting can sometimes cause the exiting thread to get stuck.
The stuck thread is not visible to gdb but does appear in the output of ps. If
another thread is trying to join the stuck thread, it will wait forever.
It can become stuck either waiting for a futex wake or busy looping. In both
cases the location is in thread_start, from pthread_create.c (around line 388 in
the git trunk code). As follows:
do
lll_futex_wait (&pd->setxid_futex, 0, LLL_PRIVATE);
while (pd->cancelhandling & SETXID_BITMASK);
>From our investigation, it happens with a tgkill from setxid_signal_thread
fails. This causes the SETDIX_BIT to be set in the target thread, but no signal
is sent and no other thread is actually waiting on it. It seems like a naive fix
would be to change the last statement in that function from:
if (!INTERNAL_SYSCALL_ERROR_P (val, err))
atomic_increment (&cmdp->cntr);
To:
if (INTERNAL_SYSCALL_ERROR_P (val, err))
t->cancelhandling &= ~SETXID_BITMASK;
else
atomic_increment (&cmdp->cntr);
This change appears to correct the problem on our machines.
On our machines (Ubuntu 9.04 with 2 or 4 way SMP) the following program
replicates the problem within a few minutes (just run it and watch for the
output to change to a continuous stream of the letter u):
#include <sys/types.h>
#include <unistd.h>
#include <pthread.h>
#include <stdio.h>
void *noop(void *arg) {
usleep(rand() % 10000);
return 0;
}
void *spawner(void *arg) {
pthread_t t;
for (;;) {
fprintf(stderr, "c");
pthread_create(&t, 0, noop, 0);
fprintf(stderr, "j");
pthread_join(t, 0);
}
}
int main() {
pthread_t spawner_id;
pthread_create(&spawner_id, 0, spawner, 0);
for(;;) {
fprintf(stderr, "u");
setuid(getuid());
usleep(10000);
}
return 0;
}
We've run the test code on several RedHat machines, with the bug happening on
machines with glibc-2.3.4-2.41, glibc-2.5-18.el5_1.1 or glibc-2.5-24. For some
reason it takes several minutes to fail on 8-way SMP machines.
--
Summary: setuid functions can stall pthread exit code
Product: glibc
Version: 2.9
Status: NEW
Severity: normal
Priority: P2
Component: nptl
AssignedTo: drepper at redhat dot com
ReportedBy: samandbernie at guarana dot org
CC: glibc-bugs at sources dot redhat dot com
http://sourceware.org/bugzilla/show_bug.cgi?id=10184
------- You are receiving this mail because: -------
You are on the CC list for the bug, or are watching someone who is.