This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH 0/4 v3] Exec events in gdbserver on Linux


This patch series is an update to the gdbserver Linux exec event patches based on review comments for the previous version.  The changes from the previous version are summarized below.

Patch 1 (gdbserver exit events):
         Fixed up comments and whitespace
         Moved test for extended exit events from linux_test_for_tracefork
	   into new function, linux_test_for_traceexit, called from
	   linux_check_ptrace_features.
	 Moved predicate functions is_extended_* into common/linux-ptrace.c
	   renamed to linux_is_extended_* and made new function
	   linux_is_extended_fork.  Replaced shift operations with calls to
	   these functions in gdbserver.
	 Eliminated the call to linux_enable_event_reporting that I had
	   added to gdbserver's CLONE event handling.
	 Wrote explanation of why extended exit events are required.

Patch 2 (gdbserver exec events):
         Moved linux_child_pid_to_exec_file from linux-nat.c and linux-low.c
           to common/linux-procfs.c and renamed it linux_proc_pid_to_exec_file.
	   Gdbserver has to take "struct target_ops *" argument now.
	 Implemented new predicate linux_is_extended_exec in common file.
	 Parenthesize condition in return stmt in extended_event_reported.
         Delete extra text from debug_printf.

Patch 3 (gdbserver exec event documentation):
         Added NEWS entries for gdbserver exec event support.

Patch 4 (exec event tests):
         No changes from the previous version.

This patch series implements support for exec events in gdbserver on GNU/Linux.  This is a step towards the "follow fork/exec" item in the local/remote debugging feature parity project: (https://sourceware.org/gdb/wiki/LocalRemoteFeatureParity).  Luis had started work on this last year, and handed it off to me at the end of January.  (BTW, thanks to Luis for his advice on this, although any problems with this patch are entirely mine.)

Follow-exec-mode and rerun behave as expected in multiprocess mode (target extended-remote), where follow-exec-mode maintains the specified inferiors and rerun runs the last executable file to be exec'd.  Catchpoints for exec are not implemented in this patch series, since this will be easier to do once fork and vfork events are also supported.

- Patch 1/4 implements support for the extended ptrace event PTRACE_EVENT_EXIT, which is a prerequisite for the exec event support.
- Patch 2/4 implements the exec event support.
- Patch 3/4 adds documentation for the new RSP "Stop Reply" message.
- Patch 4/4 extends the tests gdb.threads/non-ldr-exc-[1-4].exp to test in non-stop mode as well as in all-stop mode.

There are a couple of significant aspects to this patch.  First, it uses the ptrace extension PTRACE_EVENT_EXIT to detect thread exit, in particular the exit of a thread group leader when a non-leader calls exec.  Use of this event was necessary due to a race condition in the two-thread case.  A detailed description of the race condition can be found at the bottom of this message.  Exit events are only used internally and are not exposed to the user, and exit processing was changed as little as possible.

Second, it sends a signal to each lwp after an exec event in order to identify and clean up the lwp entry for the exec'ing thread.  This happens in the normal course of things for all-stop mode, but in non-stop mode gdbserver uses SIGSTOP to stop, then resume all of the non-leader threads to accomplish this.

My intent is to follow up with implementations of remote fork/vfork events
and associated catchpoints.

RACE CONDITION

This section explains why the existing techniques for detecting thread exit weren't sufficient for gdbserver exec events, necessitating the use of PTRACE_EVENT_EXIT.  The short answer is that there is a race condition in the current implementation that can leave a dangling entry in the lwp list (an entry that doesn't have a corresponding actual lwp).  In this case gdbserver will hang waiting for the non-existent lwp to stop.  Using the exit events eliminates this race condition. 

The same race may exist in the native implementation, since the two implementations are similar, but I haven't verified that.  It may be difficult to concoct a test case that demonstrates the race since the window is so small.

Now for the long answer: in my testing I ran into a race condition in check_zombie_leaders, which detects when a thread group leader has exited and other threads still exist.  On the Linux kernel, ptrace/waitpid don't allow reaping the leader thread until all other threads in the group are reaped.  When the leader exits, it goes zombie, but waitpid will not return an exit status until the other threads are gone.  When a non-leader thread calls exec, all other non-leader threads are destroyed, the leader becomes a zombie, and once the "other" threads have been reaped, the execing thread takes over the leader's pid (tgid) and appears to vanish.  In order to handle this situation, check_zombie_leaders polls the process state in /proc and deletes thread group leaders that are in a zombie state.  The replacement is added to the lwp list when the exec event is reported.

See https://sourceware.org/ml/gdb-patches/2011-10/msg00704.html for a more detailed explanation of how this works.

Here is the relevant part of check_zombie_leaders:

if (leader_lp != NULL
          /* Check if there are other threads in the group, as we may
             have raced with the inferior simply exiting.  */
          && !last_thread_of_process_p (leader_pid)
          && linux_proc_pid_is_zombie (leader_pid))
        {
          /* ...large informative comment block... */
          delete_lwp (leader_lp);

The race occurred when there were two threads in the program, and the non-leader thread called exec.  In this case the leader thread passed through a very brief zombie state before being replaced by the exec'ing thread as the thread group leader.  This state transition was asynchronous, with no dependency on anything gdbserver did.  Because there were no other threads, there were no thread exit events, and thus there was no synchronization with the leader passing through the zombie state and the exec completing.  If there had been more threads, the leader would remain in the zombie state until they were waited for.  In the two-thread case, sometimes the leader exit was detected and sometimes it wasn't. (Recall that check_zombie_leaders is polling the state, via linux_proc_pid_is_zombie.  The race is between the leader thread passing through the zombie state and check_zombie_leaders testing for zombie state.)  If leader exit wasn't detected, gdbserver would end up with a dangl
 ing lwp entry that didn't correspond to any real lwp, and would hang waiting for that lwp to stop.  Using PTRACE_EVENT_EXIT guarantees that the leader exit will be detected.

Note that check_zombie_leaders works just fine for the scenarios where the leader thread exits and the other threads continue to run, with no exec calls.  It is required for systems that don't support the extended ptrace events.

The sequence of events resulting in the race condition was this:

    1) In the program, a CLONE event for a new thread occurs.

    2) In the program, both threads are resumed once gdbserver has
       completed the new thread processing.

    3) In gdbserver, the function linux_wait_for_event_filtered loops until
       waitpid returns "no more events" for the SIGCHLD generated by the
       CLONE event.  Then linux_wait_for_event_filtered calls
       check_zombie_leaders.  

    4) In the program, the new thread is doing the exec.  During the exec
       the leader thread will pass through a transitory zombie state.  If
       there were more than two threads, the leader thread would remain a
       zombie until all the non-leader, non-exec'ing threads were reaped by
       gdbserver.  Since there are no such threads to reap, the leader just
       becomes a zombie and is replaced by the exec'ing thread on-the-fly.
       (Note that it appears that the leader thread is a zombie just for a
       very brief instant.)

    5) In gdbserver, check_zombie_leaders checks whether an lwp entry
       corresponds to a zombie leader thread, and if so, deletes it.  Here
       is the race: in (4) above, the leader may or may not be in the
       transitory zombie state.  In the case where a zombie isn't detected,
       delete_lwp is not called.

    6) In gdbserver, an EXEC event is detected and processed.  When it gets
       ready to report the event to GDB, it calls stop_all_lwps, which sends
       a SIGSTOP to each lwp in the list and the waits until all the lwps in
       the list have reported a stop event.  If the zombie leader wasn't
       detected and processed in step (5), gdbserver blocks forever in
       linux_wait_for_event_filtered, waiting for the undeleted lwp to be
       stopped, which will never happen.

Regards,
--Don

 gdb/NEWS                                    |    6 +
 gdb/common/linux-procfs.c                   |   18 ++
 gdb/common/linux-procfs.h                   |    4 +
 gdb/common/linux-ptrace.c                   |   75 +++++++++
 gdb/common/linux-ptrace.h                   |    4 +
 gdb/doc/gdb.texinfo                         |    6 +
 gdb/gdbserver/linux-low.c                   |  229 +++++++++++++++++++++++----
 gdb/gdbserver/linux-low.h                   |    5 +
 gdb/gdbserver/remote-utils.c                |   28 +++-
 gdb/linux-nat.c                             |   22 +---
 gdb/remote.c                                |   27 +++-
 gdb/testsuite/gdb.threads/non-ldr-exc-1.exp |   20 ++-
 gdb/testsuite/gdb.threads/non-ldr-exc-2.exp |   36 ++++-
 gdb/testsuite/gdb.threads/non-ldr-exc-3.exp |   36 ++++-
 gdb/testsuite/gdb.threads/non-ldr-exc-4.exp |   20 ++-
 15 files changed, 466 insertions(+), 70 deletions(-)


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]