RFC: nptl threading patch for linux
J. Johnston
jjohnstn@redhat.com
Wed Jun 4 20:52:00 GMT 2003
Thanks Michael. Patch checked in. I deleted the ChangeLog entry
for thread-db.c which was already made into a separate patch.
-- Jeff J.
Michael Snyder wrote:
> After reviewing the on-list discussion, it sounds like this patch
> should go in. Jeff, would you do the honors?
>
> Michael
>
> "J. Johnston" wrote:
>
>>The following is the last part of my revised nptl patch that has
>>been broken up per Daniel J.'s suggestion. There are no generated
>>files included in the patch.
>>
>>A bit of background is needed. First of all, in the nptl model,
>>lwps and pids are distinct entities. If you perform a ps on a
>>multithreaded application, you will not see its child threads show
>>up. This differs from the linuxthreads model whereby an lwp was
>>actually a pid. In the nptl model, you cannot issue a kill to
>>a child thread via its lwp. If you want to do this, you need to use
>>the tkill syscall instead. The action of sending a signal to a specific
>>lwp is used commonly in lin-lwp.c. The first part of the nptl change is to
>>determine at configuration time if we have the tkill syscall and then at run-time,
>>if we can use it. A new function kill_lwp has been added which either calls the
>>tkill syscall or the regular kill function as appropriate.
>>
>>Another key difference in behavior between nptl and linuxthreads is what happens
>>when a thread exits. In the linuxthreads model, each thread that exits causes
>>an exit event to occur. The main thread can exit before all of its children.
>>In the nptl model, only the main thread generates the exit event and it only
>>does so after all child threads have exited. So, to determine when an lwp
>>has exited, we have to constantly check its status when we are able. When
>>we get an exit event we have to determine if we are exiting the program or
>>just one of the threads. Some additional logic has been added to the exit
>>checking code such that if we have (lwp == pid), then we stop all active
>>threads and check if they have already exited. If they have not exited,
>>we resume these threads, otherwise, we delete them.
>>
>>Daniel J. brought up a good point regarding my previous attempt at this logic
>>whereby I stopped all threads and resumed all threads. That old logic is wrong
>>when scheduler_locking is set on. The new logic addresses this because
>>it only stops/resumes threads which are not already stopped. Currently, if we lock
>>into a linuxthreads thread and we run it until exit, we will cause a gdb_assert() to trigger
>>because we will field the exit event and end up with no running threads. Under nptl, we
>>never get notified of the thread exiting so for this scenario, we end up hung.
>>In this instance, there is nothing we can do without a kernel change. Daniel J.
>>is looking into a kernel change. For user threads, we could activate the death event
>>in thread-db.c and make a low-level call to notify the lwp layer, but this does not
>>handle lwps created directly by the end-user. This issue needs to be discussed
>>further outside of this patch because I want to propose changing the assertion to
>>allow the user to resume the other threads. I have tested the change with such a
>>scenario (locking a thread and running until exit). For linuxthreads we trigger the
>>assertion as before. For nptl, we hang as expected as we never get the exit event.
>>A bug can be opened for this if the patch is accepted.
>>
>>The last piece of logic has to do with a new interface needed by the nptl libthread_db
>>library to get the thread area. I have grouped this together with the lin-lwp changes
>>because the lin-lwp changes cannot work without this crucial change.
>>
>>This patch has been tested for both linuxthreads and with an nptl kernel.
>>
>>Ok to commit?
>>
>>-- Jeff J.
>>
>>2003-04-24 Jeff Johnston <jjohnstn@redhat.com>
>>
>> * acconfig.h: Add HAVE_TKILL_SYSCALL definition check.
>> * config.in: Regenerated.
>> * configure.in: Add test for syscall function and check for
>> __NR_tkill macro in <syscall.h> to set HAVE_TKILL_SYSCALL.
>> * configure: Regenerated.
>> * thread-db.c (check_event): For create/death event breakpoints,
>> loop through all messages to ensure that we read the message
>> corresponding to the breakpoint we are at.
>> * lin-lwp.c [HAVE_TKILL_SYSCALL]: Include <unistd.h> and
>> <sys/syscall.h>.
>> (kill_lwp): New function that uses tkill syscall or
>> uses kill, depending on whether threading model is nptl or not.
>> All callers of kill() changed to use kill_lwp().
>> (lin_lwp_wait): Make special check when WIFEXITED occurs to
>> see if all threads have already exited in the nptl model.
>> (stop_and_resume_callback): New callback function used by the
>> lin_lwp_wait thread exit handling code.
>> (stop_wait_callback): Check for threads already having exited and
>> delete such threads fromt the lwp list when discovered.
>> (stop_callback): Don't assert retcode of kill call.
>>
>> Roland McGrath <roland@redhat.com>
>> * i386-linux-nat.c (ps_get_thread_area): New function needed by
>> nptl libthread_db.
>>
>> -------------------------------------------------------------------------------
>>Index: lin-lwp.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/lin-lwp.c,v
>>retrieving revision 1.43
>>diff -u -r1.43 lin-lwp.c
>>--- lin-lwp.c 28 Mar 2003 21:42:41 -0000 1.43
>>+++ lin-lwp.c 23 Apr 2003 22:52:44 -0000
>>@@ -24,6 +24,10 @@
>> #include "gdb_string.h"
>> #include <errno.h>
>> #include <signal.h>
>>+#ifdef HAVE_TKILL_SYSCALL
>>+#include <unistd.h>
>>+#include <sys/syscall.h>
>>+#endif
>> #include <sys/ptrace.h>
>> #include "gdb_wait.h"
>>
>>@@ -156,6 +160,7 @@
>>
>> /* Prototypes for local functions. */
>> static int stop_wait_callback (struct lwp_info *lp, void *data);
>>+static int lin_lwp_thread_alive (ptid_t ptid);
>>
>> /* Convert wait status STATUS to a string. Used for printing debug
>> messages only. */
>>@@ -627,6 +632,32 @@
>> }
>>
>>
>>+/* Issue kill to specified lwp. */
>>+
>>+static int tkill_failed;
>>+
>>+static int
>>+kill_lwp (int lwpid, int signo)
>>+{
>>+ errno = 0;
>>+
>>+/* Use tkill, if possible, in case we are using nptl threads. If tkill
>>+ fails, then we are not using nptl threads and we should be using kill. */
>>+
>>+#ifdef HAVE_TKILL_SYSCALL
>>+ if (!tkill_failed)
>>+ {
>>+ int ret = syscall (__NR_tkill, lwpid, signo);
>>+ if (errno != ENOSYS)
>>+ return ret;
>>+ errno = 0;
>>+ tkill_failed = 1;
>>+ }
>>+#endif
>>+
>>+ return kill (lwpid, signo);
>>+}
>>+
>> /* Send a SIGSTOP to LP. */
>>
>> static int
>>@@ -642,8 +673,15 @@
>> "SC: kill %s **<SIGSTOP>**\n",
>> target_pid_to_str (lp->ptid));
>> }
>>- ret = kill (GET_LWP (lp->ptid), SIGSTOP);
>>- gdb_assert (ret == 0);
>>+ errno = 0;
>>+ ret = kill_lwp (GET_LWP (lp->ptid), SIGSTOP);
>>+ if (debug_lin_lwp)
>>+ {
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "SC: lwp kill %d %s\n",
>>+ ret,
>>+ errno ? safe_strerror (errno) : "ERRNO-OK");
>>+ }
>>
>> lp->signalled = 1;
>> gdb_assert (lp->status == 0);
>>@@ -667,11 +705,23 @@
>>
>> gdb_assert (lp->status == 0);
>>
>>- pid = waitpid (GET_LWP (lp->ptid), &status, lp->cloned ? __WCLONE : 0);
>>+ pid = waitpid (GET_LWP (lp->ptid), &status, 0);
>> if (pid == -1 && errno == ECHILD)
>>- /* OK, the proccess has disappeared. We'll catch the actual
>>- exit event in lin_lwp_wait. */
>>- return 0;
>>+ {
>>+ pid = waitpid (GET_LWP (lp->ptid), &status, __WCLONE);
>>+ if (pid == -1 && errno == ECHILD)
>>+ {
>>+ /* The thread has previously exited. We need to delete it now
>>+ because in the case of nptl threads, there won't be an
>>+ exit event unless it is the main thread. */
>>+ if (debug_lin_lwp)
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "SWC: %s exited.\n",
>>+ target_pid_to_str (lp->ptid));
>>+ delete_lwp (lp->ptid);
>>+ return 0;
>>+ }
>>+ }
>>
>> gdb_assert (pid == GET_LWP (lp->ptid));
>>
>>@@ -683,6 +733,7 @@
>> status_to_str (status));
>> }
>>
>>+ /* Check if the thread has exited. */
>> if (WIFEXITED (status) || WIFSIGNALED (status))
>> {
>> gdb_assert (num_lwps > 1);
>>@@ -697,7 +748,31 @@
>> target_pid_to_str (lp->ptid));
>> }
>> if (debug_lin_lwp)
>>- fprintf_unfiltered (gdb_stdlog, "SWC: %s exited.\n",
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "SWC: %s exited.\n",
>>+ target_pid_to_str (lp->ptid));
>>+
>>+ delete_lwp (lp->ptid);
>>+ return 0;
>>+ }
>>+
>>+ /* Check if the current LWP has previously exited. For nptl threads,
>>+ there is no exit signal issued for LWPs that are not the
>>+ main thread so we should check whenever the thread is stopped. */
>>+ if (!lin_lwp_thread_alive (lp->ptid))
>>+ {
>>+ if (in_thread_list (lp->ptid))
>>+ {
>>+ /* Core GDB cannot deal with us deleting the current
>>+ thread. */
>>+ if (!ptid_equal (lp->ptid, inferior_ptid))
>>+ delete_thread (lp->ptid);
>>+ printf_unfiltered ("[%s exited]\n",
>>+ target_pid_to_str (lp->ptid));
>>+ }
>>+ if (debug_lin_lwp)
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "SWC: %s already exited.\n",
>> target_pid_to_str (lp->ptid));
>>
>> delete_lwp (lp->ptid);
>>@@ -756,7 +831,14 @@
>> /* If there's another event, throw it back into the queue. */
>> if (lp->status)
>> {
>>- kill (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>>+ if (debug_lin_lwp)
>>+ {
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "SWC: kill %s, %s\n",
>>+ target_pid_to_str (lp->ptid),
>>+ status_to_str ((int) status));
>>+ }
>>+ kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (lp->status));
>> }
>> /* Save the sigtrap event. */
>> lp->status = status;
>>@@ -800,7 +882,7 @@
>> target_pid_to_str (lp->ptid),
>> status_to_str ((int) status));
>> }
>>- kill (GET_LWP (lp->ptid), WSTOPSIG (status));
>>+ kill_lwp (GET_LWP (lp->ptid), WSTOPSIG (status));
>> }
>> return 0;
>> }
>>@@ -1049,6 +1131,25 @@
>>
>> #endif
>>
>>+/* Stop an active thread, verify it still exists, then resume it. */
>>+
>>+static int
>>+stop_and_resume_callback (struct lwp_info *lp, void *data)
>>+{
>>+ struct lwp_info *ptr;
>>+
>>+ if (!lp->stopped && !lp->signalled)
>>+ {
>>+ stop_callback (lp, NULL);
>>+ stop_wait_callback (lp, NULL);
>>+ /* Resume if the lwp still exists. */
>>+ for (ptr = lwp_list; ptr; ptr = ptr->next)
>>+ if (lp == ptr)
>>+ resume_callback (lp, NULL);
>>+ }
>>+ return 0;
>>+}
>>+
>> static ptid_t
>> lin_lwp_wait (ptid_t ptid, struct target_waitstatus *ourstatus)
>> {
>>@@ -1206,10 +1307,61 @@
>> }
>> }
>>
>>- /* Make sure we don't report a TARGET_WAITKIND_EXITED or
>>- TARGET_WAITKIND_SIGNALLED event if there are still LWP's
>>- left in the process. */
>>+ /* Check if the thread has exited. */
>> if ((WIFEXITED (status) || WIFSIGNALED (status)) && num_lwps > 1)
>>+ {
>>+ if (in_thread_list (lp->ptid))
>>+ {
>>+ /* Core GDB cannot deal with us deleting the current
>>+ thread. */
>>+ if (!ptid_equal (lp->ptid, inferior_ptid))
>>+ delete_thread (lp->ptid);
>>+ printf_unfiltered ("[%s exited]\n",
>>+ target_pid_to_str (lp->ptid));
>>+ }
>>+
>>+ /* If this is the main thread, we must stop all threads and
>>+ verify if they are still alive. This is because in the nptl
>>+ thread model, there is no signal issued for exiting LWPs
>>+ other than the main thread. We only get the main thread
>>+ exit signal once all child threads have already exited.
>>+ If we stop all the threads and use the stop_wait_callback
>>+ to check if they have exited we can determine whether this
>>+ signal should be ignored or whether it means the end of the
>>+ debugged application, regardless of which threading model
>>+ is being used. */
>>+ if (GET_PID (lp->ptid) == GET_LWP (lp->ptid))
>>+ {
>>+ lp->stopped = 1;
>>+ iterate_over_lwps (stop_and_resume_callback, NULL);
>>+ }
>>+
>>+ if (debug_lin_lwp)
>>+ fprintf_unfiltered (gdb_stdlog,
>>+ "LLW: %s exited.\n",
>>+ target_pid_to_str (lp->ptid));
>>+
>>+ delete_lwp (lp->ptid);
>>+
>>+ /* If there is at least one more LWP, then the exit signal
>>+ was not the end of the debugged application and should be
>>+ ignored. */
>>+ if (num_lwps > 0)
>>+ {
>>+ /* Make sure there is at least one thread running. */
>>+ gdb_assert (iterate_over_lwps (running_callback, NULL));
>>+
>>+ /* Discard the event. */
>>+ status = 0;
>>+ continue;
>>+ }
>>+ }
>>+
>>+ /* Check if the current LWP has previously exited. In the nptl
>>+ thread model, LWPs other than the main thread do not issue
>>+ signals when they exit so we must check whenever the thread
>>+ has stopped. A similar check is made in stop_wait_callback(). */
>>+ if (num_lwps > 1 && !lin_lwp_thread_alive (lp->ptid))
>> {
>> if (in_thread_list (lp->ptid))
>> {
>>Index: acconfig.h
>>===================================================================
>>RCS file: /cvs/src/src/gdb/acconfig.h,v
>>retrieving revision 1.24
>>diff -u -r1.24 acconfig.h
>>--- acconfig.h 4 Jan 2003 00:34:42 -0000 1.24
>>+++ acconfig.h 23 Apr 2003 22:52:44 -0000
>>@@ -95,6 +95,9 @@
>> /* Define if using Solaris thread debugging. */
>> #undef HAVE_THREAD_DB_LIB
>>
>>+/* Define if you support the tkill syscall. */
>>+#undef HAVE_TKILL_SYSCALL
>>+
>> /* Define on a GNU/Linux system to work around problems in sys/procfs.h. */
>> #undef START_INFERIOR_TRAPS_EXPECTED
>> #undef sys_quotactl
>>Index: configure.in
>>===================================================================
>>RCS file: /cvs/src/src/gdb/configure.in,v
>>retrieving revision 1.126
>>diff -u -r1.126 configure.in
>>--- configure.in 26 Feb 2003 15:10:47 -0000 1.126
>>+++ configure.in 23 Apr 2003 22:52:44 -0000
>>@@ -384,6 +384,7 @@
>> AC_CHECK_FUNCS(setpgid setpgrp)
>> AC_CHECK_FUNCS(sigaction sigprocmask sigsetmask)
>> AC_CHECK_FUNCS(socketpair)
>>+AC_CHECK_FUNCS(syscall)
>>
>> dnl AC_FUNC_SETPGRP does not work when cross compiling
>> dnl Instead, assume we will have a prototype for setpgrp if cross compiling.
>>@@ -909,6 +910,24 @@
>> if test "x$gdb_cv_thread_db_h_has_td_notalloc" = "xyes"; then
>> AC_DEFINE(THREAD_DB_HAS_TD_NOTALLOC, 1,
>> [Define if <thread_db.h> has the TD_NOTALLOC error code.])
>>+fi
>>+
>>+dnl See if we have a sys/syscall header file that has __NR_tkill.
>>+if test "x$ac_cv_header_sys_syscall_h" = "xyes"; then
>>+ AC_CACHE_CHECK([whether <sys/syscall.h> has __NR_tkill],
>>+ gdb_cv_sys_syscall_h_has_tkill,
>>+ AC_TRY_COMPILE(
>>+ [#include <sys/syscall.h>],
>>+ [int i = __NR_tkill;],
>>+ gdb_cv_sys_syscall_h_has_tkill=yes,
>>+ gdb_cv_sys_syscall_h_has_tkill=no
>>+ )
>>+ )
>>+fi
>>+dnl See if we can issue tkill syscall.
>>+if test "x$gdb_cv_sys_syscall_h_has_tkill" = "xyes" && test "x$ac_cv_func_syscall" = "xyes"; then
>>+ AC_DEFINE(HAVE_TKILL_SYSCALL, 1,
>>+ [Define if we can use the tkill syscall.])
>> fi
>>
>> dnl Handle optional features that can be enabled.
>>Index: i386-linux-nat.c
>>===================================================================
>>RCS file: /cvs/src/src/gdb/i386-linux-nat.c,v
>>retrieving revision 1.44
>>diff -u -r1.44 i386-linux-nat.c
>>--- i386-linux-nat.c 16 Apr 2003 15:22:02 -0000 1.44
>>+++ i386-linux-nat.c 23 Apr 2003 22:52:44 -0000
>>@@ -70,6 +70,9 @@
>> /* Defines I386_LINUX_ORIG_EAX_REGNUM. */
>> #include "i386-linux-tdep.h"
>>
>>+/* Defines ps_err_e, struct ps_prochandle. */
>>+#include "gdb_proc_service.h"
>>+
>> /* Prototypes for local functions. */
>> static void dummy_sse_values (void);
>>
>>@@ -682,6 +685,21 @@
>> offsetof (struct user, u_debugreg[regnum]), value);
>> if (errno != 0)
>> perror_with_name ("Couldn't write debug register");
>>+}
>>+
>>+extern ps_err_e
>>+ps_get_thread_area(const struct ps_prochandle *ph,
>>+ lwpid_t lwpid, int idx, void **base)
>>+{
>>+ unsigned long int desc[3];
>>+#define PTRACE_GET_THREAD_AREA 25
>>+
>>+ if (ptrace (PTRACE_GET_THREAD_AREA,
>>+ lwpid, (void *) idx, (unsigned long) &desc) < 0)
>>+ return PS_ERR;
>>+
>>+ *(int *)base = desc[1];
>>+ return PS_OK;
>> }
>>
>> void
>
>
More information about the Gdb-patches
mailing list