This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Race condition in sol-thread.c


Mike,

We have encountered difficulties with multithreaded programs on Solaris 
when an LWP terminates immediately after a 'continue'.  Apparently, the 
problem occurs during a call to p_td_ta_map_id2thr (we can 
nondeterministically encounter either correct behavior, an infinite loop, or
one of a couple of error messages, such as "no thread can be found to
satisfy request").  I believe I know why and solicit your comments.

The following code appears in sol_thread_wait:

  inferior_ptid = thread_to_lwp (inferior_ptid, PIDGET (main_ph.ptid));
  if (PIDGET (inferior_ptid) == -1)
    inferior_ptid = procfs_first_available ();

  if (PIDGET (ptid) != -1)
    {
      ptid_t save_ptid = ptid;

      ptid = thread_to_lwp (ptid, -2);
      if (PIDGET (ptid) == -2)		/* Inactive thread */
	error ("This version of Solaris can't start inactive threads.");
      if (info_verbose && PIDGET (ptid) == -1)
	warning ("Specified thread %ld seems to have terminated",
		 GET_THREAD (save_ptid));
    }

  rtnval = procfs_ops.to_wait (ptid, ourstatus);

Now thread_to_lwp eventually calls p_td_ta_map_id2thr, which then
eventually calls ps_pdread.  The problem is (or appears to be) that
these calls happen before the inferior is known to be stopped (given
the placement of the to_wait call).  Comments in sol-thread suggest
that the thread_db library expects the inferior to be stopped during its
queries, but that since GDB is supposed to guarantee this state, the
routines ps_pstop, etc., are nops.  

A little instrumentation inserted in sol-thread.c and procfs.c showed that,
sure enough, as a result of the code above, the thread_db package makes
a call to ps_pstop and then to ps_pdread before there is any attempt in
procfs to stop the process.  This suggests strongly that the precondition
on ps_pdread is being violated.  

On examining the code, it seems as if all the lines quoted above, except
for the last, are unnecessary (it is suspicious that they refer to STARTING
a thread, and are a copy of lines from sol_thread_resume).  I am not
sure, because all these global variables confuse me.  In any case, removing
those lines fixed our problem and apparently introduced no regressions.

There is definitely a race condition that needs fixed.  What's YOUR take on
the right approach? 

Paul Hilfinger

----------------------------------------------------------------------

2002-07-05  Paul N. Hilfinger  <hilfingr@otisco.mckusick.com>

	* sol-thread.c (sol_thread_wait): Remove code (apparently duplicated
	from sol_thread_resume) that indirectly causes a race condition, 
	because thread_to_lwp reads the inferior's memory while the 
	inferior is not guaranteed to be stopped.


Index: gdb5_0.27/gdb/sol-thread.c
--- gdb5_0.27/gdb/sol-thread.c Thu, 24 Jan 2002 00:35:07 -0800 hilfingr (Gdb/S/50_sol-thread 1.1.1.1.1.1.1.2.1.1.1.3.1.2.1.1.1.2 644)
+++ ada-gdb5-0-merge.117(w)/gdb/sol-thread.c Fri, 05 Jul 2002 01:34:00 -0700 hilfingr (Gdb/S/50_sol-thread 1.1.1.1.1.1.1.2.1.1.1.3.1.2.1.1.1.5 644)
@@ -493,22 +504,6 @@ sol_thread_wait (ptid_t ptid, struct tar
 
   save_ptid = inferior_ptid;
   old_chain = save_inferior_ptid ();
-
-  inferior_ptid = thread_to_lwp (inferior_ptid, PIDGET (main_ph.ptid));
-  if (PIDGET (inferior_ptid) == -1)
-    inferior_ptid = procfs_first_available ();
-
-  if (PIDGET (ptid) != -1)
-    {
-      ptid_t save_ptid = ptid;
-
-      ptid = thread_to_lwp (ptid, -2);
-      if (PIDGET (ptid) == -2)		/* Inactive thread */
-	error ("This version of Solaris can't start inactive threads.");
-      if (info_verbose && PIDGET (ptid) == -1)
-	warning ("Specified thread %ld seems to have terminated",
-		 GET_THREAD (save_ptid));
-    }
 
   rtnval = procfs_ops.to_wait (ptid, ourstatus);
 


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]