Bug 15713 - i386_linux_resume calls QUIT -> lock-up
Summary: i386_linux_resume calls QUIT -> lock-up
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: tdep (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: 7.8
Assignee: Pedro Alves
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2013-07-02 19:57 UTC by Jan Kratochvil
Modified: 2014-05-29 11:52 UTC (History)
1 user (show)

See Also:
Host:
Target: i386-unknown-linux-gnu
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Jan Kratochvil 2013-07-02 19:57:23 UTC
i386_linux_resume calls target_read() which calls QUIT;.

But linux_nat_resume does not expect called linux_ops->to_resume may throw and in such case it will leave lwp_info as if the inferior resumed.  But the real inferior was not yet resumed by i386_linux_resume.

This leads to lock-up as GDB then tries to stop the inferior which is already stopped and so no new waitpid event gets generated.
Comment 2 Sourceware Commits 2014-05-29 11:51:56 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  8817a6f225766029787b5e2c1300a342b328909e (commit)
      from  251bde03baf93dbb44d3785e09e03179916143e3 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=8817a6f225766029787b5e2c1300a342b328909e

commit 8817a6f225766029787b5e2c1300a342b328909e
Author: Pedro Alves <palves@redhat.com>
Date:   Thu May 29 12:50:48 2014 +0100

    PR gdb/15713 - errors from i386_linux_resume lead to lock-up
    
    linux_nat_resume is not considering that linux_ops->to_resume may throw:
    
      /* Mark LWP as not stopped to prevent it from being continued by
         linux_nat_resume_callback.  */
      lp->stopped = 0;
    
      if (resume_many)
        iterate_over_lwps (ptid, linux_nat_resume_callback, NULL);
    
    If something within linux_nat_resume_callback throws, GDB leaves the
    lwp_info as if the inferior was resumed, while it actually wasn't.
    
    A couple examples, there are possibly others:
    
     - i386_linux_resume calls target_read which calls QUIT.
     - if the actual ptrace resumption fails in inf_ptrace_resume,
       perror_with_name is called.
    
    If the user tries to kill the inferior at this point (or quit, which
    offers to kill), GDB locks up trying to stop the lwp -- if it is
    already stopped no new waitpid event gets generated for it.
    
    Fix this by setting the stopped flag earlier, as soon as we collect a
    stop event with waitpid, and clearing it always only after resuming
    the lwp successfully.
    
    Tested on x86_64 Fedora 20.  Confirmed the lock-up disappears using a
    local hack that forces an error in inf_ptrace_resume.
    
    Also fixes a little "set debug lin-lwp" annoyance.  Currently we always see:
    
     Continuing.
     LLR: Preparing to resume process 6802, 0, inferior_ptid Thread 0x7ffff7fc7740 (LWP 6802)
                                                                                    ^^^^^^^^
     RC: Resuming sibling Thread 0x7ffff77c5700 (LWP 6807), 0, resume
     RC: Resuming sibling Thread 0x7ffff7fc6700 (LWP 6806), 0, resume
     RC: Not resuming sibling Thread 0x7ffff7fc7740 (LWP 6802) (not stopped)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^
     LLR: PTRACE_CONT process 6802, 0 (resume event thread)
    
    This patch gets rid of the "Not resuming sibling" line.
    
    2014-05-29  Pedro Alves  <palves@redhat.com>
    
    	PR gdb/15713
    	* linux-nat.c (linux_nat_resume_callback): Rename the second
    	parameter to 'except'.  Skip LP if it points to EXCEPT.
    	(linux_nat_resume): Don't mark the event lwp as not stopped
    	before resuming sibling lwps.  Instead ask
    	linux_nat_resume_callback to skip the event lwp.  Mark it as not
    	stopped after actually resuming it.
    	(linux_handle_syscall_trap): Mark the lwp as not stopped after
    	resuming it.
    	(wait_lwp): Mark the lwp as stopped here.
    	(stop_wait_callback): Mark the lwp as not stopped right after
    	resuming it.  Don't mark lwps as stopped here.
    	(linux_nat_filter_event): Mark the lwp as stopped earlier.
    	(linux_nat_wait_1): Don't mark dead lwps as stopped here.

-----------------------------------------------------------------------

Summary of changes:
 gdb/ChangeLog   |   17 +++++++++++++++++
 gdb/linux-nat.c |   41 ++++++++++++++++++-----------------------
 2 files changed, 35 insertions(+), 23 deletions(-)
Comment 3 Pedro Alves 2014-05-29 11:52:45 UTC
Fixed.