[PATCH 2/5] Linux: on attach, attach to lwps listed under /proc/$pid/task/

Pedro Alves palves@redhat.com
Wed Dec 17 13:35:00 GMT 2014


On 12/16/2014 08:52 PM, Simon Marchi wrote:
> On 2014-12-16 11:53 AM, Pedro Alves wrote:
>> ... instead of relying on libthread_db.
>>
>> I wrote a test that attaches to a program that constantly spawns
>> short-lived threads, which exposed several issues.  This is one of
>> them.
>>
>> On Linux, we need to attach to all threads of a process (thread group)
>> individually.  We currently rely on libthread_db to list the threads,
>> but that is problematic, because libthread_db relies on reading data
>> structures out of the inferior (which may well be corrupted).  If
>> threads are being created or exiting just while we try to attach, we
>> may trip on inconsistencies in the inferior's thread list.  To work
>> around that, when we see a seemingly corrupt list, we currently retry
>> a few times:
>>
>>  static void
>>  thread_db_find_new_threads_2 (ptid_t ptid, int until_no_new)
>>  {
>>  ...
>>    if (until_no_new)
>>      {
>>        /* Require 4 successive iterations which do not find any new threads.
>> 	  The 4 is a heuristic: there is an inherent race here, and I have
>> 	  seen that 2 iterations in a row are not always sufficient to
>> 	  "capture" all threads.  */
>>  ...
>>
>> That heuristic may well fail, and when it does, we end up with threads
>> in the program that aren't under GDB's control.  That's obviously bad
>> and results in quite mistifying failures, like e.g., the process dying
>> for seeminly no reason when a thread that wasn't attached trips on a
>> breakpoint.
>>
>> There's really no reason to rely on libthread_db for this nowadays
>> when we have /proc mounted.  In that case, which is the usual case, we
>> can list the LWPs from /proc/PID/task/.  In fact, GDBserver is already
>> doing this.  The patch factors out that code that knows to walk the
>> task/ directory out of GDBserver, and makes GDB use it too.
>>
>> Like GDBserver, the patch makes GDB attach to LWPs and _not_ wait for
>> them to stop immediately.  Instead, we just tag the LWP as having an
>> expected stop.  Because we can only set the ptrace options when the
>> thread stops, we need a new flag in the lwp structure to keep track of
>> whether we've already set the ptrace options, just like in GDBserver.
>> Note that nothing issues any ptrace command to the threads between the
>> PTRACE_ATTACH and the stop, so this is safe (unlike one scenario
>> described in gdbserver's linux-low.c).
>>
>> When we attach to a program that has threads exiting while we attach,
>> it's easy to race with a thread just exiting as we try to attach to
>> it, like:
>>
>>   #1 - get current list of threads
>>   #2 - attach to each listed thread
>>   #3 - ooops, attach failed, thread is already gone
>>
>> As this is pretty normal, we shouldn't be issuing a scary warning in
>> step #3.
>>
>> When #3 happens, PTRACE_ATTACH usually fails with ESRCH, but sometimes
>> we'll see EPERM as well.  That happens when the kernel still has the
>> kernel in its task list, but the thread is marked as dead.
> 
> "still has the kernel" -> "still has the thread"

Indeed.  Fixed locally.

>> 	(linux_attach): Adjus to rename and use
> 
> Adjus -> Adjust
> 

Fixed.

> 
> I think it makes sense, not that I know anything about it.

Thanks for the review.

Thanks,
Pedro Alves



More information about the Gdb-patches mailing list