i386_linux_resume calls target_read() which calls QUIT;. But linux_nat_resume does not expect called linux_ops->to_resume may throw and in such case it will leave lwp_info as if the inferior resumed. But the real inferior was not yet resumed by i386_linux_resume. This leads to lock-up as GDB then tries to stop the inferior which is already stopped and so no new waitpid event gets generated.
https://sourceware.org/ml/gdb-patches/2014-05/msg00473.html
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "gdb and binutils". The branch, master has been updated via 8817a6f225766029787b5e2c1300a342b328909e (commit) from 251bde03baf93dbb44d3785e09e03179916143e3 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8817a6f225766029787b5e2c1300a342b328909e commit 8817a6f225766029787b5e2c1300a342b328909e Author: Pedro Alves <palves@redhat.com> Date: Thu May 29 12:50:48 2014 +0100 PR gdb/15713 - errors from i386_linux_resume lead to lock-up linux_nat_resume is not considering that linux_ops->to_resume may throw: /* Mark LWP as not stopped to prevent it from being continued by linux_nat_resume_callback. */ lp->stopped = 0; if (resume_many) iterate_over_lwps (ptid, linux_nat_resume_callback, NULL); If something within linux_nat_resume_callback throws, GDB leaves the lwp_info as if the inferior was resumed, while it actually wasn't. A couple examples, there are possibly others: - i386_linux_resume calls target_read which calls QUIT. - if the actual ptrace resumption fails in inf_ptrace_resume, perror_with_name is called. If the user tries to kill the inferior at this point (or quit, which offers to kill), GDB locks up trying to stop the lwp -- if it is already stopped no new waitpid event gets generated for it. Fix this by setting the stopped flag earlier, as soon as we collect a stop event with waitpid, and clearing it always only after resuming the lwp successfully. Tested on x86_64 Fedora 20. Confirmed the lock-up disappears using a local hack that forces an error in inf_ptrace_resume. Also fixes a little "set debug lin-lwp" annoyance. Currently we always see: Continuing. LLR: Preparing to resume process 6802, 0, inferior_ptid Thread 0x7ffff7fc7740 (LWP 6802) ^^^^^^^^ RC: Resuming sibling Thread 0x7ffff77c5700 (LWP 6807), 0, resume RC: Resuming sibling Thread 0x7ffff7fc6700 (LWP 6806), 0, resume RC: Not resuming sibling Thread 0x7ffff7fc7740 (LWP 6802) (not stopped) ^^^^^^^^^^^^^^^^^^^^^^^ LLR: PTRACE_CONT process 6802, 0 (resume event thread) This patch gets rid of the "Not resuming sibling" line. 2014-05-29 Pedro Alves <palves@redhat.com> PR gdb/15713 * linux-nat.c (linux_nat_resume_callback): Rename the second parameter to 'except'. Skip LP if it points to EXCEPT. (linux_nat_resume): Don't mark the event lwp as not stopped before resuming sibling lwps. Instead ask linux_nat_resume_callback to skip the event lwp. Mark it as not stopped after actually resuming it. (linux_handle_syscall_trap): Mark the lwp as not stopped after resuming it. (wait_lwp): Mark the lwp as stopped here. (stop_wait_callback): Mark the lwp as not stopped right after resuming it. Don't mark lwps as stopped here. (linux_nat_filter_event): Mark the lwp as stopped earlier. (linux_nat_wait_1): Don't mark dead lwps as stopped here. ----------------------------------------------------------------------- Summary of changes: gdb/ChangeLog | 17 +++++++++++++++++ gdb/linux-nat.c | 41 ++++++++++++++++++----------------------- 2 files changed, 35 insertions(+), 23 deletions(-)
Fixed.