This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC]: fix for recycled thread ids


Daniel Jacobowitz wrote:
On Thu, Mar 18, 2004 at 07:36:25PM -0500, Jeff Johnston wrote:

The following patch fixes a problem when a user application creates a thread shortly after another thread has completed. For nptl, thread ids are addresses. If a thread completes/dies, the tid is available for reuse by a new thread.


Does NPTL re-use the TID quickly, or cycle around the way LT did so
that we only see this under high thread pressure?


I can't say for sure as I don't maintain libthread_db. The test case in question does create high thread pressure, but I think it would be a mistake to generalize and think that this couldn't happen in an existing application.



On RH9 and RHEL3, nptl threads do not have exit events associated with them. I have already discussed this with Daniel J. who feels that the kernels are not doing the right thing, but regardless, the current and previous RH nptl kernels are behaving this way and gdb needs to handle it. As such, when a new thread is created, if it is reusing the tid of a previous thread that gdb hasn't figured out isn't around any more, gdb ignores the create event and the new thread is not added. Ignoring the event is done because it is possible for gdb to find out about the thread before it's creation event is reported and so the create event can be redundant information.


What I haven't seen a good explanation of is what problem this causes. If a thread goes away, and then a new thread using the same ID is
created, and then we stop, what do we lose besides the cosmetic fact
that there is no [New Thread] message? Does anything go wrong?


Also, I would like the issue of whether or not it is a kernel bug
resolved before we discuss working around it in GDB.


The problem is if a global signal is passed on to the inferior program when there are threads we have not attached to, the process terminates. A Ctrl-C is such a signal. In the example program, we only attach to the first 100 threads and when the Ctrl-C is issued, we get:


ptrace: No such process.
thread_db_get_info: cannot get thread info: generic error

The end-user is cooked.

Regarding resolving this issue as a kernel error, any fix for RHEL3 won't get shipped until Update 3. I know of no scheduled update for RH9 and this would not qualify as a security update.

If I read your subsequent comments right, you wouldn't have a problem with me just making the lwp compare directly in the thread-db.c code. This would mean the change would be all of 6 lines of code. Does it make a lot of sense to withhold this change in light of the time-frame a kernel fix could be issued and the simplicity of the change itself?


The following fix removes the problem by also checking if the original lwp given to the old thread in the list matches the lwp for the new thread. IIRC, there is at least one platform that allows threads to change their lwps dynamically. I do not believe that platform uses the thread-db layer, but to be safe, I did not use a direct compare between the original lwp and the new lwp. Instead, I added a target vector routine that by default never returns an unequal comparison. For linux, the comparison routine points to a new routine in lin-lwp.c which does the expected comparison. The comparison is based on C compare routines whereby 0 means equal and non-zero means unequal.


At least one such "platform" was NGPT, which runs on Linux (and worked
OK with thread-db.c, too).  It's obsolete nowadays.


Now, if a thread id for a new thread is found to already be on the list, the target comparison is made between the original lwp stored for the old thread and the lwp for the new thread. If the comparison is unequal, the old thread is deleted and then the new thread is added. Otherwise, the new thread event is ignored as before.


If we aren't going to support migration, then why not just store the
LWP ID in the thread PTID structure?  That's why it has three fields,
I'd guess.

I don't think that the target hook makes much sense, since thread-db.c
is basically Linux-specific at this point and since the contents being
passed to the hook are thread-db-specific.


Would it make sense to rename thread-db.c to lin-thread-db.c?


-- Jeff J.
#include <signal.h>

#include <pthread.h>
#include <unistd.h>
#include <string.h>
#include <netdb.h>
#include <stdio.h>
#include <stdlib.h>

/**
 * Compile with: 
 * gcc -g -Wall -lpthread -o lphello lphello.c
 *
 * Author Magnus Ihse, ihse@bea.com 
 */
#define THREAD_COUNT 100
#define ITER_COUNT 500000

static volatile int finishedArray[THREAD_COUNT];
static int pKey;

void* setup_altstack(void) {
   stack_t ss;

   ss.ss_sp = malloc(20*1024);
   if (ss.ss_sp == 0) {
      return NULL;
   }
   ss.ss_size = 20*1024;
   ss.ss_flags = 0;
       
   if (sigaltstack(&ss, NULL) == -1) {
      //perror("sigaltstack");
      return NULL;
   }
   return ss.ss_sp;
}

void takedown_altstack(void* stack) {
   struct sigaltstack ss;
   int result;

   if (stack == NULL) {
      return;
   }
   
   ss.ss_flags = SS_DISABLE;
   ss.ss_sp = (void*)47;  // This value should be ignored when ss_flags is SS_DISABLE
   ss.ss_size = 29;       // This value should be ignored when ss_flags is SS_DISABLE
   
   {
      result = sigaltstack(&ss, NULL);
      free(stack);
   }
}

void *threadfunc(void *arg) {
   int mypos = (int)(size_t)arg;
   int i;
   long square = 1;
   void* altstack = setup_altstack();
   
   pthread_setspecific(pKey, arg);
   for (i=0; i < 1000; i++) {
      square = i*i + square*mypos;
   }
   finishedArray[mypos] = 1;
   takedown_altstack(altstack);
   
   return NULL;
}


int main(int argc, char ** argv) {
   pthread_t threads[THREAD_COUNT];
   pthread_attr_t attr;
   int result;
   int i;
   int iteration;
   int finished;

   pthread_attr_init(&attr);
   pthread_attr_setstacksize(&attr, 128*1024);

   pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE);
  
   pthread_key_create(&pKey, NULL);

   for (iteration = 0; iteration < ITER_COUNT; iteration++) {
      if ((iteration % 100) == 0) {
         printf("\nStarting run series %i: ", iteration);
      }

      if ((iteration % 10) == 0) {
         printf(".");
         fflush(stdout);
      }

      // Clear array
      for (i = 0; i< THREAD_COUNT; i++) {
	 finishedArray[i] = 0;
      }

      // Start threads
      for (i = 0; i< THREAD_COUNT; i++) {
	 result = pthread_create(&threads[i], &attr, threadfunc, (void*)(size_t)i);
	 if (result != 0) {
	    perror("pthread_create");
	    exit(1);
	 }
      }

      // Join threads
      for (i = 0; i< THREAD_COUNT; i++) {
         result = pthread_join(threads[i], NULL);
         if (result != 0) {
            perror("pthread_join");
            exit(EXIT_FAILURE);
         }
      }
      
      // Spin waiting for results
      finished = 1;
      do {
         struct timespec req, rem;

         req.tv_sec = 0;
         req.tv_nsec = 10 * 1000 * 1000;

         nanosleep(&req, &rem);
         
         for (i = 0; i< THREAD_COUNT; i++) {
            if (finishedArray[i] != 1) {
               finished = 0;
               break;
            }
         }
      } while (!finished);

   }

   return EXIT_SUCCESS;
}

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]