This is the mail archive of the
gdb-patches@sources.redhat.com
mailing list for the GDB project.
Re: [RFC]: fix for recycled thread ids
Daniel Jacobowitz wrote:
On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:
The following patch fixes a problem when a user application creates a
thread shortly after another thread has completed. For nptl, thread ids
are addresses. If a thread completes/dies, the tid is available for reuse
by a new thread.
Does NPTL re-use the TID quickly, or cycle around the way LT did so
that we only see this under high thread pressure?
I can't say for sure as I don't maintain libthread_db. The test case in
question does create high thread pressure, but I think it would be a mistake to
generalize and think that this couldn't happen in an existing application.
On RH9 and RHEL3, nptl threads do not have exit events associated with
them. I have already discussed this with Daniel J. who feels that the
kernels are not doing the right thing, but regardless, the current and
previous RH nptl kernels are behaving this way and gdb needs to handle it.
As such, when a new thread is created, if it is reusing the tid of a
previous thread that gdb hasn't figured out isn't around any more, gdb
ignores the create event and the new thread is not added. Ignoring the
event is done because it is possible for gdb to find out about the thread
before it's creation event is reported and so the create event can be
redundant information.
What I haven't seen a good explanation of is what problem this causes.
If a thread goes away, and then a new thread using the same ID is
created, and then we stop, what do we lose besides the cosmetic fact
that there is no [New Thread] message? Does anything go wrong?
Also, I would like the issue of whether or not it is a kernel bug
resolved before we discuss working around it in GDB.
The problem is if a global signal is passed on to the inferior program when
there are threads we have not attached to, the process terminates. A Ctrl-C is
such a signal. In the example program, we only attach to the first 100 threads
and when the Ctrl-C is issued, we get:
ptrace: No such process.
thread_db_get_info: cannot get thread info: generic error
The end-user is cooked.
Regarding resolving this issue as a kernel error, any fix for RHEL3 won't get
shipped until Update 3. I know of no scheduled update for RH9 and this would
not qualify as a security update.
If I read your subsequent comments right, you wouldn't have a problem with me
just making the lwp compare directly in the thread-db.c code. This would mean
the change would be all of 6 lines of code. Does it make a lot of sense to
withhold this change in light of the time-frame a kernel fix could be issued and
the simplicity of the change itself?
The following fix removes the problem by also checking if the original lwp
given to the old thread in the list matches the lwp for the new thread.
IIRC, there is at least one platform that allows threads to change their
lwps dynamically. I do not believe that platform uses the thread-db layer,
but to be safe, I did not use a direct compare between the original lwp and
the new lwp. Instead, I added a target vector routine that by default
never returns an unequal comparison. For linux, the comparison routine
points to a new routine in lin-lwp.c which does the expected comparison.
The comparison is based on C compare routines whereby 0 means equal and
non-zero means unequal.
At least one such "platform" was NGPT, which runs on Linux (and worked
OK with thread-db.c, too). It's obsolete nowadays.
Now, if a thread id for a new thread is found to already be on the list,
the target comparison is made between the original lwp stored for the old
thread and the lwp for the new thread. If the comparison is unequal, the
old thread is deleted and then the new thread is added. Otherwise, the new
thread event is ignored as before.
If we aren't going to support migration, then why not just store the
LWP ID in the thread PTID structure? That's why it has three fields,
I'd guess.
I don't think that the target hook makes much sense, since thread-db.c
is basically Linux-specific at this point and since the contents being
passed to the hook are thread-db-specific.
Would it make sense to rename thread-db.c to lin-thread-db.c?
-- Jeff J.
#include <signal.h>
#include <pthread.h>
#include <unistd.h>
#include <string.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>
/**
* Compile with:
* gcc -g -Wall -lpthread -o lphello lphello.c
*
* Author Magnus Ihse, ihse@bea.com
*/
#define THREAD_COUNT 100
#define ITER_COUNT 500000
static volatile int finishedArray[THREAD_COUNT];
static int pKey;
void* setup_altstack(void) {
stack_t ss;
ss.ss_sp = malloc(20*1024);
if (ss.ss_sp == 0) {
return NULL;
}
ss.ss_size = 20*1024;
ss.ss_flags = 0;
if (sigaltstack(&ss, NULL) == -1) {
//perror("sigaltstack");
return NULL;
}
return ss.ss_sp;
}
void takedown_altstack(void* stack) {
struct sigaltstack ss;
int result;
if (stack == NULL) {
return;
}
ss.ss_flags = SS_DISABLE;
ss.ss_sp = (void*)47; // This value should be ignored when ss_flags is SS_DISABLE
ss.ss_size = 29; // This value should be ignored when ss_flags is SS_DISABLE
{
result = sigaltstack(&ss, NULL);
free(stack);
}
}
void *threadfunc(void *arg) {
int mypos = (int)(size_t)arg;
int i;
long square = 1;
void* altstack = setup_altstack();
pthread_setspecific(pKey, arg);
for (i=0; i < 1000; i++) {
square = i*i + square*mypos;
}
finishedArray[mypos] = 1;
takedown_altstack(altstack);
return NULL;
}
int main(int argc, char ** argv) {
pthread_t threads[THREAD_COUNT];
pthread_attr_t attr;
int result;
int i;
int iteration;
int finished;
pthread_attr_init(&attr);
pthread_attr_setstacksize(&attr, 128*1024);
pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
pthread_key_create(&pKey, NULL);
for (iteration = 0; iteration < ITER_COUNT; iteration++) {
if ((iteration % 100) == 0) {
printf("\nStarting run series %i: ", iteration);
}
if ((iteration % 10) == 0) {
printf(".");
fflush(stdout);
}
// Clear array
for (i = 0; i< THREAD_COUNT; i++) {
finishedArray[i] = 0;
}
// Start threads
for (i = 0; i< THREAD_COUNT; i++) {
result = pthread_create(&threads[i], &attr, threadfunc, (void*)(size_t)i);
if (result != 0) {
perror("pthread_create");
exit(1);
}
}
// Join threads
for (i = 0; i< THREAD_COUNT; i++) {
result = pthread_join(threads[i], NULL);
if (result != 0) {
perror("pthread_join");
exit(EXIT_FAILURE);
}
}
// Spin waiting for results
finished = 1;
do {
struct timespec req, rem;
req.tv_sec = 0;
req.tv_nsec = 10 * 1000 * 1000;
nanosleep(&req, &rem);
for (i = 0; i< THREAD_COUNT; i++) {
if (finishedArray[i] != 1) {
finished = 0;
break;
}
}
} while (!finished);
}
return EXIT_SUCCESS;
}