[PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
Simon Marchi
simon.marchi@ericsson.com
Tue Dec 16 20:52:00 GMT 2014
On 2014-12-16 11:53 AM, Pedro Alves wrote:
> ... instead of relying on libthread_db.
>
> I wrote a test that attaches to a program that constantly spawns
> short-lived threads, which exposed several issues. This is one of
> them.
>
> On Linux, we need to attach to all threads of a process (thread group)
> individually. We currently rely on libthread_db to list the threads,
> but that is problematic, because libthread_db relies on reading data
> structures out of the inferior (which may well be corrupted). If
> threads are being created or exiting just while we try to attach, we
> may trip on inconsistencies in the inferior's thread list. To work
> around that, when we see a seemingly corrupt list, we currently retry
> a few times:
>
> static void
> thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
> {
> ...
> if (until_no_new)
> {
> /* Require 4 successive iterations which do not find any new threads.
> The 4 is a heuristic: there is an inherent race here, and I have
> seen that 2 iterations in a row are not always sufficient to
> "capture" all threads. */
> ...
>
> That heuristic may well fail, and when it does, we end up with threads
> in the program that aren't under GDB's control. That's obviously bad
> and results in quite mistifying failures, like e.g., the process dying
> for seeminly no reason when a thread that wasn't attached trips on a
> breakpoint.
>
> There's really no reason to rely on libthread_db for this nowadays
> when we have /proc mounted. In that case, which is the usual case, we
> can list the LWPs from /proc/PID/task/. In fact, GDBserver is already
> doing this. The patch factors out that code that knows to walk the
> task/ directory out of GDBserver, and makes GDB use it too.
>
> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
> them to stop immediately. Instead, we just tag the LWP as having an
> expected stop. Because we can only set the ptrace options when the
> thread stops, we need a new flag in the lwp structure to keep track of
> whether we've already set the ptrace options, just like in GDBserver.
> Note that nothing issues any ptrace command to the threads between the
> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
> described in gdbserver's linux-low.c).
>
> When we attach to a program that has threads exiting while we attach,
> it's easy to race with a thread just exiting as we try to attach to
> it, like:
>
> #1 - get current list of threads
> #2 - attach to each listed thread
> #3 - ooops, attach failed, thread is already gone
>
> As this is pretty normal, we shouldn't be issuing a scary warning in
> step #3.
>
> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
> we'll see EPERM as well. That happens when the kernel still has the
> kernel in its task list, but the thread is marked as dead.
"still has the kernel" -> "still has the thread"
> Unfortunately, EPERM is ambiguous and we'll get it also on other
> scenarios where the thread isn't dead, and in those cases, it's useful
> to get a warning. To distiguish the cases, when we get an EPERM
> failure, we open /proc/PID/status, and check the thread's state -- if
> the /proc file no longer exists, or the state is "Z (Zombie)" or "X
> (Dead)", we ignore the EPERM error silently; otherwise, we'll warn.
> Unfortunately, there seems to be a kernel race here. Sometimes I get
> EPERM, and then the /proc state still indicates "R (Running)"... If
> we wait a bit and retry, we do end up seeing X or Z state, or get an
> ESRCH. I thought of making GDB retry the attach a few times, but even
> with a 500ms wait and 4 retries, I still see the warning sometimes. I
> haven't been able to identify the kernel path that causes this yet,
> but in any case, it looks like a kernel bug to me. As this just
> results failure to suppress a warning that we've been printing since
> about forever anyway, I'm just making the test cope with it, and issue
> an XFAIL.
>
> gdb/gdbserver/
> 2014-12-16 Pedro Alves <palves@redhat.com>
>
> * linux-low.c (linux_attach_fail_reason_string): Move to
> nat/linux-ptrace.c, and rename.
> (linux_attach_lwp): Update comment.
> (attach_proc_task_lwp_callback): New function.
> (linux_attach): Adjus to rename and use
Adjus -> Adjust
> linux_proc_attach_tgid_threads.
> (linux_attach_fail_reason_string): Delete declaration.
>
> gdb/
> 2014-12-16 Pedro Alves <palves@redhat.com>
>
> * linux-nat.c (attach_proc_task_lwp_callback): New function.
> (linux_nat_attach): Use linux_proc_attach_tgid_threads.
> (wait_lwp, linux_nat_filter_event): If not set yet, set the lwp's
> ptrace option flags.
> * linux-nat.h (struct lwp_info) <must_set_ptrace_flags>: New
> field.
> * nat/linux-procfs.c: Include <dirent.h>.
> (linux_proc_get_int): New parameter "warn". Handle it.
> (linux_proc_get_tgid): Adjust.
> (linux_proc_get_tracerpid): Rename to ...
> (linux_proc_get_tracerpid_nowarn): ... this.
> (linux_proc_pid_get_state): New function, factored out from
> (linux_proc_pid_has_state): ... this. Add new parameter "warn"
> and handle it.
> (linux_proc_pid_is_gone): New function.
> (linux_proc_pid_is_stopped): Adjust.
> (linux_proc_pid_is_zombie_maybe_warn)
> (linux_proc_pid_is_zombie_nowarn): New functions.
> (linux_proc_pid_is_zombie): Use
> linux_proc_pid_is_zombie_maybe_warn.
> (linux_proc_attach_tgid_threads): New function.
> * nat/linux-procfs.h (linux_proc_get_tgid): Update comment.
> (linux_proc_get_tracerpid): Rename to ...
> (linux_proc_get_tracerpid_nowarn): ... this, and update comment.
> (linux_proc_pid_is_gone): New declaration.
> (linux_proc_pid_is_zombie): Update comment.
> (linux_proc_pid_is_zombie_nowarn): New declaration.
> (linux_proc_attach_lwp_func): New typedef.
> (linux_proc_attach_tgid_threads): New declaration.
> * nat/linux-ptrace.c (linux_ptrace_attach_fail_reason): Adjust to
> use nowarn functions.
> (linux_ptrace_attach_fail_reason_string): Move here from
> gdbserver/linux-low.c and rename.
> (ptrace_supports_feature): If the current ptrace options are not
> known yet, check them now, instead of asserting.
> * nat/linux-ptrace.h (linux_ptrace_attach_fail_reason_string):
> Declare.
I think it makes sense, not that I know anything about it.
Simon
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4079 bytes
Desc: S/MIME Cryptographic Signature
URL: <http://sourceware.org/pipermail/gdb-patches/attachments/20141216/f8158f4a/attachment.p7s>
More information about the Gdb-patches
mailing list