This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
- From: Pedro Alves <palves at redhat dot com>
- To: Simon Marchi <simon dot marchi at ericsson dot com>, gdb-patches at sourceware dot org
- Date: Wed, 17 Dec 2014 13:35:37 +0000
- Subject: Re: [PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/
- Authentication-results: sourceware.org; auth=none
- References: <1418748834-27545-1-git-send-email-palves at redhat dot com> <1418748834-27545-3-git-send-email-palves at redhat dot com> <54909B99 dot 5070806 at ericsson dot com>
On 12/16/2014 08:52 PM, Simon Marchi wrote:
> On 2014-12-16 11:53 AM, Pedro Alves wrote:
>> ... instead of relying on libthread_db.
>>
>> I wrote a test that attaches to a program that constantly spawns
>> short-lived threads, which exposed several issues. This is one of
>> them.
>>
>> On Linux, we need to attach to all threads of a process (thread group)
>> individually. We currently rely on libthread_db to list the threads,
>> but that is problematic, because libthread_db relies on reading data
>> structures out of the inferior (which may well be corrupted). If
>> threads are being created or exiting just while we try to attach, we
>> may trip on inconsistencies in the inferior's thread list. To work
>> around that, when we see a seemingly corrupt list, we currently retry
>> a few times:
>>
>> static void
>> thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
>> {
>> ...
>> if (until_no_new)
>> {
>> /* Require 4 successive iterations which do not find any new threads.
>> The 4 is a heuristic: there is an inherent race here, and I have
>> seen that 2 iterations in a row are not always sufficient to
>> "capture" all threads. */
>> ...
>>
>> That heuristic may well fail, and when it does, we end up with threads
>> in the program that aren't under GDB's control. That's obviously bad
>> and results in quite mistifying failures, like e.g., the process dying
>> for seeminly no reason when a thread that wasn't attached trips on a
>> breakpoint.
>>
>> There's really no reason to rely on libthread_db for this nowadays
>> when we have /proc mounted. In that case, which is the usual case, we
>> can list the LWPs from /proc/PID/task/. In fact, GDBserver is already
>> doing this. The patch factors out that code that knows to walk the
>> task/ directory out of GDBserver, and makes GDB use it too.
>>
>> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
>> them to stop immediately. Instead, we just tag the LWP as having an
>> expected stop. Because we can only set the ptrace options when the
>> thread stops, we need a new flag in the lwp structure to keep track of
>> whether we've already set the ptrace options, just like in GDBserver.
>> Note that nothing issues any ptrace command to the threads between the
>> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
>> described in gdbserver's linux-low.c).
>>
>> When we attach to a program that has threads exiting while we attach,
>> it's easy to race with a thread just exiting as we try to attach to
>> it, like:
>>
>> #1 - get current list of threads
>> #2 - attach to each listed thread
>> #3 - ooops, attach failed, thread is already gone
>>
>> As this is pretty normal, we shouldn't be issuing a scary warning in
>> step #3.
>>
>> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
>> we'll see EPERM as well. That happens when the kernel still has the
>> kernel in its task list, but the thread is marked as dead.
>
> "still has the kernel" -> "still has the thread"
Indeed. Fixed locally.
>> (linux_attach): Adjus to rename and use
>
> Adjus -> Adjust
>
Fixed.
>
> I think it makes sense, not that I know anything about it.
Thanks for the review.
Thanks,
Pedro Alves