This is the mail archive of the gdb-patches@sources.redhat.com mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Race condition in sol-thread.c


"Paul N. Hilfinger" wrote:
> 
> Mike,
> 
> We have encountered difficulties with multithreaded programs on Solaris
> when an LWP terminates immediately after a 'continue'.  Apparently, the
> problem occurs during a call to p_td_ta_map_id2thr (we can
> nondeterministically encounter either correct behavior, an infinite loop, or
> one of a couple of error messages, such as "no thread can be found to
> satisfy request").  I believe I know why and solicit your comments.

Paul, first of all, please forgive me for taking so long
to get back to you.  I've been a little bit snowed under.


> The following code appears in sol_thread_wait:
> 
>   inferior_ptid = thread_to_lwp (inferior_ptid, PIDGET (main_ph.ptid));
>   if (PIDGET (inferior_ptid) == -1)
>     inferior_ptid = procfs_first_available ();
> 
>   if (PIDGET (ptid) != -1)
>     {
>       ptid_t save_ptid = ptid;
> 
>       ptid = thread_to_lwp (ptid, -2);
>       if (PIDGET (ptid) == -2)          /* Inactive thread */
>         error ("This version of Solaris can't start inactive threads.");
>       if (info_verbose && PIDGET (ptid) == -1)
>         warning ("Specified thread %ld seems to have terminated",
>                  GET_THREAD (save_ptid));
>     }
> 
>   rtnval = procfs_ops.to_wait (ptid, ourstatus);
> 
> Now thread_to_lwp eventually calls p_td_ta_map_id2thr, which then
> eventually calls ps_pdread.  The problem is (or appears to be) that
> these calls happen before the inferior is known to be stopped (given
> the placement of the to_wait call).  Comments in sol-thread suggest
> that the thread_db library expects the inferior to be stopped during its
> queries, but that since GDB is supposed to guarantee this state, the
> routines ps_pstop, etc., are nops.

Yep -- you're right.  Calling thread_to_lwp is potentially
(though not certainly) going to read memory, and we're not
allowed to read memory until the target is stopped.  This is
a bug.

The weird thing is, it's a bug that's been in there since 
version 1.1 of sol-thread.c, written back in 1996.  I can't
say it's NEVER bitten anybody, but it must require a special
circumstance.

The problem is that the code you want to remove is there
for a reason.  I actually added that code in 1997, and you're
probably right that I cribbed it from sol_thread_resume.
Its purpose was specifically to deal with a thread that had
exited.  There's probably a more correct way to do it, but
removing it entirely isn't it.

Michael


> A little instrumentation inserted in sol-thread.c and procfs.c showed that,
> sure enough, as a result of the code above, the thread_db package makes
> a call to ps_pstop and then to ps_pdread before there is any attempt in
> procfs to stop the process.  This suggests strongly that the precondition
> on ps_pdread is being violated.
> 
> On examining the code, it seems as if all the lines quoted above, except
> for the last, are unnecessary (it is suspicious that they refer to STARTING
> a thread, and are a copy of lines from sol_thread_resume).  I am not
> sure, because all these global variables confuse me.  In any case, removing
> those lines fixed our problem and apparently introduced no regressions.
> 
> There is definitely a race condition that needs fixed.  What's YOUR take on
> the right approach?
> 
> Paul Hilfinger
> 
> ----------------------------------------------------------------------
> 
> 2002-07-05  Paul N. Hilfinger  <hilfingr@otisco.mckusick.com>
> 
>         * sol-thread.c (sol_thread_wait): Remove code (apparently duplicated
>         from sol_thread_resume) that indirectly causes a race condition,
>         because thread_to_lwp reads the inferior's memory while the
>         inferior is not guaranteed to be stopped.
> 
> Index: gdb5_0.27/gdb/sol-thread.c
> --- gdb5_0.27/gdb/sol-thread.c Thu, 24 Jan 2002 00:35:07 -0800 hilfingr (Gdb/S/50_sol-thread 1.1.1.1.1.1.1.2.1.1.1.3.1.2.1.1.1.2 644)
> +++ ada-gdb5-0-merge.117(w)/gdb/sol-thread.c Fri, 05 Jul 2002 01:34:00 -0700 hilfingr (Gdb/S/50_sol-thread 1.1.1.1.1.1.1.2.1.1.1.3.1.2.1.1.1.5 644)
> @@ -493,22 +504,6 @@ sol_thread_wait (ptid_t ptid, struct tar
> 
>    save_ptid = inferior_ptid;
>    old_chain = save_inferior_ptid ();
> -
> -  inferior_ptid = thread_to_lwp (inferior_ptid, PIDGET (main_ph.ptid));
> -  if (PIDGET (inferior_ptid) == -1)
> -    inferior_ptid = procfs_first_available ();
> -
> -  if (PIDGET (ptid) != -1)
> -    {
> -      ptid_t save_ptid = ptid;
> -
> -      ptid = thread_to_lwp (ptid, -2);
> -      if (PIDGET (ptid) == -2)         /* Inactive thread */
> -       error ("This version of Solaris can't start inactive threads.");
> -      if (info_verbose && PIDGET (ptid) == -1)
> -       warning ("Specified thread %ld seems to have terminated",
> -                GET_THREAD (save_ptid));
> -    }
> 
>    rtnval = procfs_ops.to_wait (ptid, ourstatus);
>


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]