[PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/

Simon Marchi simon.marchi@ericsson.com
Tue Dec 16 20:52:00 GMT 2014


On 2014-12-16 11:53 AM, Pedro Alves wrote:
> ... instead of relying on libthread_db.
> 
> I wrote a test that attaches to a program that constantly spawns
> short-lived threads, which exposed several issues.  This is one of
> them.
> 
> On Linux, we need to attach to all threads of a process (thread group)
> individually.  We currently rely on libthread_db to list the threads,
> but that is problematic, because libthread_db relies on reading data
> structures out of the inferior (which may well be corrupted).  If
> threads are being created or exiting just while we try to attach, we
> may trip on inconsistencies in the inferior's thread list.  To work
> around that, when we see a seemingly corrupt list, we currently retry
> a few times:
> 
>  static void
>  thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
>  {
>  ...
>    if (until_no_new)
>      {
>        /* Require 4 successive iterations which do not find any new threads.
> 	  The 4 is a heuristic: there is an inherent race here, and I have
> 	  seen that 2 iterations in a row are not always sufficient to
> 	  "capture" all threads.  */
>  ...
> 
> That heuristic may well fail, and when it does, we end up with threads
> in the program that aren't under GDB's control.  That's obviously bad
> and results in quite mistifying failures, like e.g., the process dying
> for seeminly no reason when a thread that wasn't attached trips on a
> breakpoint.
> 
> There's really no reason to rely on libthread_db for this nowadays
> when we have /proc mounted.  In that case, which is the usual case, we
> can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
> doing this.  The patch factors out that code that knows to walk the
> task/ directory out of GDBserver, and makes GDB use it too.
> 
> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
> them to stop immediately.  Instead, we just tag the LWP as having an
> expected stop.  Because we can only set the ptrace options when the
> thread stops, we need a new flag in the lwp structure to keep track of
> whether we've already set the ptrace options, just like in GDBserver.
> Note that nothing issues any ptrace command to the threads between the
> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
> described in gdbserver's linux-low.c).
> 
> When we attach to a program that has threads exiting while we attach,
> it's easy to race with a thread just exiting as we try to attach to
> it, like:
> 
>   #1 - get current list of threads
>   #2 - attach to each listed thread
>   #3 - ooops, attach failed, thread is already gone
> 
> As this is pretty normal, we shouldn't be issuing a scary warning in
> step #3.
> 
> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
> we'll see EPERM as well.  That happens when the kernel still has the
> kernel in its task list, but the thread is marked as dead.

"still has the kernel" -> "still has the thread"

> Unfortunately, EPERM is ambiguous and we'll get it also on other
> scenarios where the thread isn't dead, and in those cases, it's useful
> to get a warning.  To distiguish the cases, when we get an EPERM
> failure, we open /proc/PID/status, and check the thread's state -- if
> the /proc file no longer exists, or the state is "Z (Zombie)" or "X
> (Dead)", we ignore the EPERM error silently; otherwise, we'll warn.
> Unfortunately, there seems to be a kernel race here.  Sometimes I get
> EPERM, and then the /proc state still indicates "R (Running)"...  If
> we wait a bit and retry, we do end up seeing X or Z state, or get an
> ESRCH.  I thought of making GDB retry the attach a few times, but even
> with a 500ms wait and 4 retries, I still see the warning sometimes.  I
> haven't been able to identify the kernel path that causes this yet,
> but in any case, it looks like a kernel bug to me.  As this just
> results failure to suppress a warning that we've been printing since
> about forever anyway, I'm just making the test cope with it, and issue
> an XFAIL.
> 
> gdb/gdbserver/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-low.c (linux_attach_fail_reason_string): Move to
> 	nat/linux-ptrace.c, and rename.
> 	(linux_attach_lwp): Update comment.
> 	(attach_proc_task_lwp_callback): New function.
> 	(linux_attach): Adjus to rename and use

Adjus -> Adjust

> 	linux_proc_attach_tgid_threads.
> 	(linux_attach_fail_reason_string): Delete declaration.
> 
> gdb/
> 2014-12-16  Pedro Alves  <palves@redhat.com>
> 
> 	* linux-nat.c (attach_proc_task_lwp_callback): New function.
> 	(linux_nat_attach): Use linux_proc_attach_tgid_threads.
> 	(wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's
> 	ptrace option flags.
> 	* linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New
> 	field.
> 	* nat/linux-procfs.c: Include <dirent.h>.
> 	(linux_proc_get_int): New parameter "warn".  Handle it.
> 	(linux_proc_get_tgid): Adjust.
> 	(linux_proc_get_tracerpid): Rename to ...
> 	(linux_proc_get_tracerpid_nowarn): ... this.
> 	(linux_proc_pid_get_state): New function, factored out from
> 	(linux_proc_pid_has_state): ... this.  Add new parameter "warn"
> 	and handle it.
> 	(linux_proc_pid_is_gone): New function.
> 	(linux_proc_pid_is_stopped): Adjust.
> 	(linux_proc_pid_is_zombie_maybe_warn)
> 	(linux_proc_pid_is_zombie_nowarn): New functions.
> 	(linux_proc_pid_is_zombie): Use
> 	linux_proc_pid_is_zombie_maybe_warn.
> 	(linux_proc_attach_tgid_threads): New function.
> 	* nat/linux-procfs.h (linux_proc_get_tgid): Update comment.
> 	(linux_proc_get_tracerpid): Rename to ...
> 	(linux_proc_get_tracerpid_nowarn): ... this, and update comment.
> 	(linux_proc_pid_is_gone): New declaration.
> 	(linux_proc_pid_is_zombie): Update comment.
> 	(linux_proc_pid_is_zombie_nowarn): New declaration.
> 	(linux_proc_attach_lwp_func): New typedef.
> 	(linux_proc_attach_tgid_threads): New declaration.
> 	* nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to
> 	use nowarn functions.
> 	(linux_ptrace_attach_fail_reason_string): Move here from
> 	gdbserver/linux-low.c and rename.
> 	(ptrace_supports_feature): If the current ptrace options are not
> 	known yet, check them now, instead of asserting.
> 	* nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string):
> 	Declare.

I think it makes sense, not that I know anything about it.

Simon

-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4079 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://sourceware.org/pipermail/gdb-patches/attachments/20141216/f8158f4a/attachment.p7s>


More information about the Gdb-patches mailing list