RFC: nptl threading patch for linux

J. Johnston jjohnstn@redhat.com
Wed Jun 4 20:52:00 GMT 2003


Thanks Michael.  Patch checked in.  I deleted the ChangeLog entry
for thread-db.c which was already made into a separate patch.

-- Jeff J.

Michael Snyder wrote:
> After reviewing the on-list discussion, it sounds like this patch
> should go in.  Jeff, would you do the honors?
> 
> Michael
> 
> "J. Johnston" wrote:
> 
>>The following is the last part of my revised nptl patch that has
>>been broken up per Daniel J.'s suggestion.  There are no generated
>>files included in the patch.
>>
>>A bit of background is needed.  First of all, in the nptl model,
>>lwps and pids are distinct entities.  If you perform a ps on a
>>multithreaded application, you will not see its child threads show
>>up.  This differs from the linuxthreads model whereby an lwp was
>>actually a pid.  In the nptl model, you cannot issue a kill to
>>a child thread via its lwp.  If you want to do this, you need to use
>>the tkill syscall instead.  The action of sending a signal to a specific
>>lwp is used commonly in lin-lwp.c.  The first part of the nptl change is to
>>determine at configuration time if we have the tkill syscall and then at run-time,
>>if we can use it.  A new function kill_lwp has been added which either calls the
>>tkill syscall or the regular kill function as appropriate.
>>
>>Another key difference in behavior between nptl and linuxthreads is what happens
>>when a thread exits.  In the linuxthreads model, each thread that exits causes
>>an exit event to occur.  The main thread can exit before all of its children.
>>In the nptl model, only the main thread generates the exit event and it only
>>does so after all child threads have exited.  So, to determine when an lwp
>>has exited, we have to constantly check its status when we are able.  When
>>we get an exit event we have to determine if we are exiting the program or
>>just one of the threads.  Some additional logic has been added to the exit
>>checking code such that if we have (lwp == pid), then we stop all active
>>threads and check if they have already exited.  If they have not exited,
>>we resume these threads, otherwise, we delete them.
>>
>>Daniel J. brought up a good point regarding my previous attempt at this logic
>>whereby I stopped all threads and resumed all threads.  That old logic is wrong
>>when scheduler_locking is set on.  The new logic addresses this because
>>it only stops/resumes threads which are not already stopped.  Currently, if we lock
>>into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
>>because we will field the exit event and end up with no running threads.  Under nptl, we
>>never get notified of the thread exiting so for this scenario, we end up hung.
>>In this instance, there is nothing we can do without a kernel change.  Daniel J.
>>is looking into a kernel change.  For user threads, we could activate the death event
>>in thread-db.c and make a low-level call to notify the lwp layer, but this does not
>>handle lwps created directly by the end-user.  This issue needs to be discussed
>>further outside of this patch because I want to propose changing the assertion to
>>allow the user to resume the other threads.  I have tested the change with such a
>>scenario (locking a thread and running until exit).  For linuxthreads we trigger the
>>assertion as before.  For nptl, we hang as expected as we never get the exit event.
>>A bug can be opened for this if the patch is accepted.
>>
>>The last piece of logic has to do with a new interface needed by the nptl libthread_db
>>library to get the thread area.  I have grouped this together with the lin-lwp changes
>>because the lin-lwp changes cannot work without this crucial change.
>>
>>This patch has been tested for both linuxthreads and with an nptl kernel.
>>
>>Ok to commit?
>>
>>-- Jeff J.
>>
>>2003-04-24  Jeff Johnston  <jjohnstn@redhat.com>
>>
>>        * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
>>        * config.in: Regenerated.
>>        * configure.in: Add test for syscall function and check for
>>        __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
>>        * configure: Regenerated.
>>        * thread-db.c (check_event): For create/death event breakpoints,
>>        loop through all messages to ensure that we read the message
>>        corresponding to the breakpoint we are at.
>>        * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
>>        <sys/syscall.h>.
>>        (kill_lwp): New function that uses tkill syscall or
>>        uses kill, depending on whether threading model is nptl or not.
>>        All callers of kill() changed to use kill_lwp().
>>        (lin_lwp_wait): Make special check when WIFEXITED occurs to
>>        see if all threads have already exited in the nptl model.
>>        (stop_and_resume_callback): New callback function used by the
>>        lin_lwp_wait thread exit handling code.
>>        (stop_wait_callback): Check for threads already having exited and
>>        delete such threads fromt the lwp list when discovered.
>>        (stop_callback): Don't assert retcode of kill call.
>>
>>        Roland McGrath  <roland@redhat.com>
>>        * i386-linux-nat.c (ps_get_thread_area): New function needed by
>>        nptl libthread_db.
>>
>>  -------------------------------------------------------------------------------
>>Index: lin-lwp.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/lin-lwp.c,v
>>retrieving revision 1.43
>>diff -u -r1.43 lin-lwp.c
>>--- lin-lwp.c   28 Mar 2003 21:42:41 -0000      1.43
>>+++ lin-lwp.c   23 Apr 2003 22:52:44 -0000
>>@@ -24,6 +24,10 @@
>> #include "gdb_string.h"
>> #include <errno.h>
>> #include <signal.h>
>>+#ifdef HAVE_TKILL_SYSCALL
>>+#include <unistd.h>
>>+#include <sys/syscall.h>
>>+#endif
>> #include <sys/ptrace.h>
>> #include "gdb_wait.h"
>>
>>@@ -156,6 +160,7 @@
>>
>> /* Prototypes for local functions.  */
>> static int stop_wait_callback (struct lwp_info *lp, void *data);
>>+static int lin_lwp_thread_alive (ptid_t ptid);
>>
>> /* Convert wait status STATUS to a string.  Used for printing debug
>>    messages only.  */
>>@@ -627,6 +632,32 @@
>> }
>>
>>
>>+/* Issue kill to specified lwp.  */
>>+
>>+static int tkill_failed;
>>+
>>+static int
>>+kill_lwp (int lwpid, int signo)
>>+{
>>+  errno = 0;
>>+
>>+/* Use tkill, if possible, in case we are using nptl threads.  If tkill
>>+   fails, then we are not using nptl threads and we should be using kill.  */
>>+
>>+#ifdef HAVE_TKILL_SYSCALL
>>+  if (!tkill_failed)
>>+    {
>>+      int ret = syscall (__NR_tkill, lwpid, signo);
>>+      if (errno != ENOSYS)
>>+       return ret;
>>+      errno = 0;
>>+      tkill_failed = 1;
>>+    }
>>+#endif
>>+
>>+  return kill (lwpid, signo);
>>+}
>>+
>> /* Send a SIGSTOP to LP.  */
>>
>> static int
>>@@ -642,8 +673,15 @@
>>                              "SC:  kill %s **<SIGSTOP>**\n",
>>                              target_pid_to_str (lp->ptid));
>>        }
>>-      ret = kill (GET_LWP (lp->ptid), SIGSTOP);
>>-      gdb_assert (ret == 0);
>>+      errno = 0;
>>+      ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
>>+      if (debug_lin_lwp)
>>+       {
>>+         fprintf_unfiltered (gdb_stdlog,
>>+                             "SC:  lwp kill %d %s\n",
>>+                             ret,
>>+                             errno ? safe_strerror (errno) : "ERRNO-OK");
>>+       }
>>
>>       lp->signalled = 1;
>>       gdb_assert (lp->status == 0);
>>@@ -667,11 +705,23 @@
>>
>>       gdb_assert (lp->status == 0);
>>
>>-      pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
>>+      pid = waitpid (GET_LWP (lp->ptid), &status, 0);
>>       if (pid == -1 && errno == ECHILD)
>>-       /* OK, the proccess has disappeared.  We'll catch the actual
>>-          exit event in lin_lwp_wait.  */
>>-       return 0;
>>+       {
>>+         pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
>>+         if (pid == -1 && errno == ECHILD)
>>+           {
>>+             /* The thread has previously exited.  We need to delete it now
>>+                because in the case of nptl threads, there won't be an
>>+                exit event unless it is the main thread.  */
>>+             if (debug_lin_lwp)
>>+               fprintf_unfiltered (gdb_stdlog,
>>+                                   "SWC: %s exited.\n",
>>+                                   target_pid_to_str (lp->ptid));
>>+             delete_lwp (lp->ptid);
>>+             return 0;
>>+           }
>>+       }
>>
>>       gdb_assert (pid == GET_LWP (lp->ptid));
>>
>>@@ -683,6 +733,7 @@
>>                              status_to_str (status));
>>        }
>>
>>+      /* Check if the thread has exited.  */
>>       if (WIFEXITED (status) || WIFSIGNALED (status))
>>        {
>>          gdb_assert (num_lwps > 1);
>>@@ -697,7 +748,31 @@
>>                                 target_pid_to_str (lp->ptid));
>>            }
>>          if (debug_lin_lwp)
>>-           fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
>>+           fprintf_unfiltered (gdb_stdlog,
>>+                               "SWC: %s exited.\n",
>>+                               target_pid_to_str (lp->ptid));
>>+
>>+         delete_lwp (lp->ptid);
>>+         return 0;
>>+       }
>>+
>>+      /* Check if the current LWP has previously exited.  For nptl threads,
>>+         there is no exit signal issued for LWPs that are not the
>>+         main thread so we should check whenever the thread is stopped.  */
>>+      if (!lin_lwp_thread_alive (lp->ptid))
>>+       {
>>+         if (in_thread_list (lp->ptid))
>>+           {
>>+             /* Core GDB cannot deal with us deleting the current
>>+                thread.  */
>>+             if (!ptid_equal (lp->ptid, inferior_ptid))
>>+               delete_thread (lp->ptid);
>>+             printf_unfiltered ("[%s exited]\n",
>>+                                target_pid_to_str (lp->ptid));
>>+           }
>>+         if (debug_lin_lwp)
>>+           fprintf_unfiltered (gdb_stdlog,
>>+                               "SWC: %s already exited.\n",
>>                                target_pid_to_str (lp->ptid));
>>
>>          delete_lwp (lp->ptid);
>>@@ -756,7 +831,14 @@
>>              /* If there's another event, throw it back into the queue. */
>>              if (lp->status)
>>                {
>>-                 kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>>+                 if (debug_lin_lwp)
>>+                   {
>>+                     fprintf_unfiltered (gdb_stdlog,
>>+                                         "SWC: kill %s, %s\n",
>>+                                         target_pid_to_str (lp->ptid),
>>+                                         status_to_str ((int) status));
>>+                   }
>>+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>>                }
>>              /* Save the sigtrap event. */
>>              lp->status = status;
>>@@ -800,7 +882,7 @@
>>                                          target_pid_to_str (lp->ptid),
>>                                          status_to_str ((int) status));
>>                    }
>>-                 kill (GET_LWP (lp->ptid), WSTOPSIG (status));
>>+                 kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
>>                }
>>              return 0;
>>            }
>>@@ -1049,6 +1131,25 @@
>>
>> #endif
>>
>>+/* Stop an active thread, verify it still exists, then resume it.  */
>>+
>>+static int
>>+stop_and_resume_callback (struct lwp_info *lp, void *data)
>>+{
>>+  struct lwp_info *ptr;
>>+
>>+  if (!lp->stopped && !lp->signalled)
>>+    {
>>+      stop_callback (lp, NULL);
>>+      stop_wait_callback (lp, NULL);
>>+      /* Resume if the lwp still exists.  */
>>+      for (ptr = lwp_list; ptr; ptr = ptr->next)
>>+       if (lp == ptr)
>>+         resume_callback (lp, NULL);
>>+    }
>>+  return 0;
>>+}
>>+
>> static ptid_t
>> lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
>> {
>>@@ -1206,10 +1307,61 @@
>>                }
>>            }
>>
>>-         /* Make sure we don't report a TARGET_WAITKIND_EXITED or
>>-            TARGET_WAITKIND_SIGNALLED event if there are still LWP's
>>-            left in the process.  */
>>+         /* Check if the thread has exited.  */
>>          if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
>>+           {
>>+             if (in_thread_list (lp->ptid))
>>+               {
>>+                 /* Core GDB cannot deal with us deleting the current
>>+                    thread.  */
>>+                 if (!ptid_equal (lp->ptid, inferior_ptid))
>>+                   delete_thread (lp->ptid);
>>+                 printf_unfiltered ("[%s exited]\n",
>>+                                    target_pid_to_str (lp->ptid));
>>+               }
>>+
>>+             /* If this is the main thread, we must stop all threads and
>>+                verify if they are still alive.  This is because in the nptl
>>+                thread model, there is no signal issued for exiting LWPs
>>+                other than the main thread.  We only get the main thread
>>+                exit signal once all child threads have already exited.
>>+                If we stop all the threads and use the stop_wait_callback
>>+                to check if they have exited we can determine whether this
>>+                signal should be ignored or whether it means the end of the
>>+                debugged application, regardless of which threading model
>>+                is being used.  */
>>+             if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
>>+               {
>>+                 lp->stopped = 1;
>>+                 iterate_over_lwps (stop_and_resume_callback, NULL);
>>+               }
>>+
>>+             if (debug_lin_lwp)
>>+               fprintf_unfiltered (gdb_stdlog,
>>+                                   "LLW: %s exited.\n",
>>+                                   target_pid_to_str (lp->ptid));
>>+
>>+             delete_lwp (lp->ptid);
>>+
>>+             /* If there is at least one more LWP, then the exit signal
>>+                was not the end of the debugged application and should be
>>+                ignored.  */
>>+             if (num_lwps > 0)
>>+               {
>>+                 /* Make sure there is at least one thread running.  */
>>+                 gdb_assert (iterate_over_lwps (running_callback, NULL));
>>+
>>+                 /* Discard the event.  */
>>+                 status = 0;
>>+                 continue;
>>+               }
>>+           }
>>+
>>+         /* Check if the current LWP has previously exited.  In the nptl
>>+            thread model, LWPs other than the main thread do not issue
>>+            signals when they exit so we must check whenever the thread
>>+            has stopped.  A similar check is made in stop_wait_callback().  */
>>+         if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
>>            {
>>              if (in_thread_list (lp->ptid))
>>                {
>>Index: acconfig.h
>>===================================================================
>>RCS file: /cvs/src/src/gdb/acconfig.h,v
>>retrieving revision 1.24
>>diff -u -r1.24 acconfig.h
>>--- acconfig.h  4 Jan 2003 00:34:42 -0000       1.24
>>+++ acconfig.h  23 Apr 2003 22:52:44 -0000
>>@@ -95,6 +95,9 @@
>> /* Define if using Solaris thread debugging.  */
>> #undef HAVE_THREAD_DB_LIB
>>
>>+/* Define if you support the tkill syscall.  */
>>+#undef HAVE_TKILL_SYSCALL
>>+
>> /* Define on a GNU/Linux system to work around problems in sys/procfs.h.  */
>> #undef START_INFERIOR_TRAPS_EXPECTED
>> #undef sys_quotactl
>>Index: configure.in
>>===================================================================
>>RCS file: /cvs/src/src/gdb/configure.in,v
>>retrieving revision 1.126
>>diff -u -r1.126 configure.in
>>--- configure.in        26 Feb 2003 15:10:47 -0000      1.126
>>+++ configure.in        23 Apr 2003 22:52:44 -0000
>>@@ -384,6 +384,7 @@
>> AC_CHECK_FUNCS(setpgid setpgrp)
>> AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
>> AC_CHECK_FUNCS(socketpair)
>>+AC_CHECK_FUNCS(syscall)
>>
>> dnl AC_FUNC_SETPGRP does not work when cross compiling
>> dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
>>@@ -909,6 +910,24 @@
>> if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
>>   AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
>>             [Define if <thread_db.h> has the TD_NOTALLOC error code.])
>>+fi
>>+
>>+dnl See if we have a sys/syscall header file that has __NR_tkill.
>>+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
>>+   AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
>>+                  gdb_cv_sys_syscall_h_has_tkill,
>>+     AC_TRY_COMPILE(
>>+       [#include <sys/syscall.h>],
>>+       [int i = __NR_tkill;],
>>+       gdb_cv_sys_syscall_h_has_tkill=yes,
>>+       gdb_cv_sys_syscall_h_has_tkill=no
>>+     )
>>+   )
>>+fi
>>+dnl See if we can issue tkill syscall.
>>+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
>>+  AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
>>+            [Define if we can use the tkill syscall.])
>> fi
>>
>> dnl Handle optional features that can be enabled.
>>Index: i386-linux-nat.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
>>retrieving revision 1.44
>>diff -u -r1.44 i386-linux-nat.c
>>--- i386-linux-nat.c    16 Apr 2003 15:22:02 -0000      1.44
>>+++ i386-linux-nat.c    23 Apr 2003 22:52:44 -0000
>>@@ -70,6 +70,9 @@
>> /* Defines I386_LINUX_ORIG_EAX_REGNUM.  */
>> #include "i386-linux-tdep.h"
>>
>>+/* Defines ps_err_e, struct ps_prochandle.  */
>>+#include "gdb_proc_service.h"
>>+
>> /* Prototypes for local functions.  */
>> static void dummy_sse_values (void);
>>
>>@@ -682,6 +685,21 @@
>>          offsetof (struct user, u_debugreg[regnum]), value);
>>   if (errno != 0)
>>     perror_with_name ("Couldn't write debug register");
>>+}
>>+
>>+extern ps_err_e
>>+ps_get_thread_area(const struct ps_prochandle *ph,
>>+                  lwpid_t lwpid, int idx, void **base)
>>+{
>>+  unsigned long int desc[3];
>>+#define PTRACE_GET_THREAD_AREA 25
>>+
>>+  if  (ptrace (PTRACE_GET_THREAD_AREA,
>>+              lwpid, (void *) idx, (unsigned long) &desc) < 0)
>>+    return PS_ERR;
>>+
>>+  *(int *)base = desc[1];
>>+  return PS_OK;
>> }
>>
>> void
> 
> 




More information about the Gdb-patches mailing list