Bug 17511 - Program received signal SIGTRAP, after step to signal handler -> step inside handler -> continue
Summary: Program received signal SIGTRAP, after step to signal handler -> step inside ...
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: HEAD
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 18063 (view as bug list)
Depends on:
Blocks:
 
Reported: 2014-10-26 22:15 UTC by Pedro Alves
Modified: 2016-08-10 22:32 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Pedro Alves 2014-10-26 22:15:10 UTC
While writing a GDB test I noticed that when I have a signal pending/queued, and I single-step into signal handler, and then issue another step inside the handler, the following continues will result in spurious SIGTRAPS.

The problem is that $eflags.TF ends stuck/set.

I'm on Fedora 20 (Linux 3.16.4-200.fc20.x86_64).

Vis:

 (gdb) start
 Temporary breakpoint 1, main () at si-handler.c:48
 48        setup ();
 (gdb) next
 50        global = 0; /* set break here */

Let's queue a signal, so we can step into the handler:

 (gdb) handle SIGUSR1
 Signal        Stop      Print   Pass to program Description
 SIGUSR1       Yes       Yes     Yes             User defined signal 1
 (gdb) info inferiors
   Num  Description       Executable
 * 1    process 29953     si-handler
 (gdb) shell kill -SIGUSR1 29953
 (gdb) c
 Continuing.

 Program received signal SIGUSR1, User defined signal 1.
 main () at si-handler.c:50
 50        global = 0; /* set break here */
 (gdb) display $eflags
 1: $eflags = [ PF ZF IF ]

(With mainline GDB, you can instead just do "queue-signal SIGUSR1".)

Now step into the handler -- "si" does PTRACE_SINGLESTEP+SIGUSR1:

 (gdb) si
 sigusr1_handler (sig=0) at si-handler.c:31
 31      {
 1: $eflags = [ PF ZF IF ]

Looks fine so far.  But another single-step...

 (gdb) si
 0x0000000000400621      31      {
 1: $eflags = [ PF ZF TF IF ]

... ends up with TF left set.  This results in PTRACE_CONTINUE trapping
after each instruction is executed:

 (gdb) c
 Continuing.

 Program received signal SIGTRAP, Trace/breakpoint trap.
 0x0000000000400624 in sigusr1_handler (sig=0) at si-handler.c:31
 31      {
 1: $eflags = [ PF ZF TF IF ]

 (gdb) c
 Continuing.

 Program received signal SIGTRAP, Trace/breakpoint trap.
 sigusr1_handler (sig=10) at si-handler.c:32
 32        global = 0;
 1: $eflags = [ PF ZF TF IF ]
 (gdb)

Note that even another PTRACE_SINGLESTEP does not fix it:

 (gdb) si
 33      }
 1: $eflags = [ PF ZF TF IF ]
 (gdb)

Eventually, it gets "fixed" by the rt_sigreturn syscall, when returning
out of the handler:

 (gdb) bt
 #0  sigusr1_handler (sig=10) at si-handler.c:33
 #1  <signal handler called>
 #2  main () at si-handler.c:50
 (gdb) set disassemble-next-line on
 (gdb) si
 0x0000000000400632      33      }
    0x0000000000400631 <sigusr1_handler+17>:     5d      pop    %rbp
 => 0x0000000000400632 <sigusr1_handler+18>:     c3      retq
 1: $eflags = [ PF ZF TF IF ]
 (gdb)
 <signal handler called>
 => 0x0000003b36a358f0 <__restore_rt+0>: 48 c7 c0 0f 00 00 00    mov    $0xf,%rax
 1: $eflags = [ PF ZF TF IF ]
 (gdb) si
 <signal handler called>
 => 0x0000003b36a358f7 <__restore_rt+7>: 0f 05   syscall
 1: $eflags = [ PF ZF TF IF ]
 (gdb)
 main () at si-handler.c:50
 50        global = 0; /* set break here */
 => 0x000000000040066b <main+9>: c7 05 cb 09 20 00 00 00 00 00   movl   $0x0,0x2009cb(%rip)        # 0x601040 <global>
 1: $eflags = [ PF ZF IF ]
 (gdb)

I don't get the bug if I instead PTRACE_CONTINUE into the signal
handler -- e.g., set a breakpoint in the handler, queue a signal,
and "continue".

Below's the code I was using to test this.

~~~~
/* This testcase is part of GDB, the GNU debugger.

   Copyright 2014 Free Software Foundation, Inc.

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 3 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */

#include <signal.h>

volatile int global;

static void
signal_handler (int sig)
{
  global = 0;
  global = 0;
  global = 0;
  global = 0;
  global = 0;
}

void
setup (void)
{
  /* Set up the signal handler.  */
  signal (SIGUSR1, signal_handler);
}

void
begin (void)
{
}

void
end (void)
{
}

int
main (void)
{
  setup ();
  begin ();
  end ();
  return 0;
}
~~~~
Comment 1 Pedro Alves 2014-10-26 22:18:29 UTC
This is a kernel bug.  I've reported it to Oleg Nesterov, and he pointed me at:

	http://marc.info/?t=127550678000005

The first message has a ptrace-only test-case.

I'm extending gdb.base/sigstep.exp to cover this, xfailed.
Comment 2 cvs-commit@gcc.gnu.org 2014-10-28 15:52:43 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  abbdbd03db7eea82cadbb418da733991cba91b15 (commit)
      from  5a4b0ccc20ba30caef53b01bee2c0aaa5b855339 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=abbdbd03db7eea82cadbb418da733991cba91b15

commit abbdbd03db7eea82cadbb418da733991cba91b15
Author: Pedro Alves <palves@redhat.com>
Date:   Tue Oct 28 15:51:30 2014 +0000

    Test for PR gdb/17511, spurious SIGTRAP after stepping into+in signal handler
    
    I noticed that when I single-step into a signal handler with a
    pending/queued signal, the following single-steps while the program is
    in the signal handler leave $eflags.TF set.  That means subsequent
    continues will trap after one instruction, resulting in a spurious
    SIGTRAP being reported to the user.
    
    This is a kernel bug; I've reported it to kernel devs (turned out to
    be a known bug).  I'm seeing it on x86_64 Fedora 20 (Linux
    3.16.4-200.fc20.x86_64), and I was told it's still not fixed upstream.
    
    This commit extends gdb.base/sigstep.exp to cover this use case,
    xfailed.
    
    Here's what the bug looks like:
    
     (gdb) start
     Temporary breakpoint 1, main () at si-handler.c:48
     48        setup ();
     (gdb) next
     50        global = 0; /* set break here */
    
    Let's queue a signal, so we can step into the handler:
    
     (gdb) handle SIGUSR1
     Signal        Stop      Print   Pass to program Description
     SIGUSR1       Yes       Yes     Yes             User defined signal 1
     (gdb) queue-signal SIGUSR1
    
    TF is not set:
    
     (gdb) display $eflags
     1: $eflags = [ PF ZF IF ]
    
    Now step into the handler -- "si" does PTRACE_SINGLESTEP+SIGUSR1:
    
     (gdb) si
     sigusr1_handler (sig=0) at si-handler.c:31
     31      {
     1: $eflags = [ PF ZF IF ]
    
    No TF yet.  But another single-step...
    
     (gdb) si
     0x0000000000400621      31      {
     1: $eflags = [ PF ZF TF IF ]
    
    ... ends up with TF left set.  This results in PTRACE_CONTINUE
    trapping after each instruction is executed:
    
     (gdb) c
     Continuing.
    
     Program received signal SIGTRAP, Trace/breakpoint trap.
     0x0000000000400624 in sigusr1_handler (sig=0) at si-handler.c:31
     31      {
     1: $eflags = [ PF ZF TF IF ]
    
     (gdb) c
     Continuing.
    
     Program received signal SIGTRAP, Trace/breakpoint trap.
     sigusr1_handler (sig=10) at si-handler.c:32
     32        global = 0;
     1: $eflags = [ PF ZF TF IF ]
     (gdb)
    
    Note that even another PTRACE_SINGLESTEP does not fix it:
    
     (gdb) si
     33      }
     1: $eflags = [ PF ZF TF IF ]
     (gdb)
    
    Eventually, it gets "fixed" by the rt_sigreturn syscall, when
    returning out of the handler:
    
     (gdb) bt
     #0  sigusr1_handler (sig=10) at si-handler.c:33
     #1  <signal handler called>
     #2  main () at si-handler.c:50
     (gdb) set disassemble-next-line on
     (gdb) si
     0x0000000000400632      33      }
        0x0000000000400631 <sigusr1_handler+17>:     5d      pop    %rbp
     => 0x0000000000400632 <sigusr1_handler+18>:     c3      retq
     1: $eflags = [ PF ZF TF IF ]
     (gdb)
     <signal handler called>
     => 0x0000003b36a358f0 <__restore_rt+0>: 48 c7 c0 0f 00 00 00    mov    $0xf,%rax
     1: $eflags = [ PF ZF TF IF ]
     (gdb) si
     <signal handler called>
     => 0x0000003b36a358f7 <__restore_rt+7>: 0f 05   syscall
     1: $eflags = [ PF ZF TF IF ]
     (gdb)
     main () at si-handler.c:50
     50        global = 0; /* set break here */
     => 0x000000000040066b <main+9>: c7 05 cb 09 20 00 00 00 00 00   movl   $0x0,0x2009cb(%rip)        # 0x601040 <global>
     1: $eflags = [ PF ZF IF ]
     (gdb)
    
    The bug doesn't happen if we instead PTRACE_CONTINUE into the signal
    handler -- e.g., set a breakpoint in the handler, queue a signal, and
    "continue".
    
    gdb/testsuite/
    2014-10-28  Pedro Alves  <palves@redhat.com>
    
    	PR gdb/17511
    	* gdb.base/sigstep.c (handler): Add a few more writes to 'done'.
    	* gdb.base/sigstep.exp (other_handler_location): New global.
    	(advance): Support stepping into the signal handler, and running
    	commands while in the handler.
    	(in_handler_map): New global.
    	(top level): In the advance test, add combinations for getting
    	into the handler with stepping commands, and for running commands
    	in the handler.  Add comment descripting the advancei tests.

-----------------------------------------------------------------------

Summary of changes:
 gdb/testsuite/ChangeLog            |   12 +++++
 gdb/testsuite/gdb.base/sigstep.c   |    5 ++
 gdb/testsuite/gdb.base/sigstep.exp |   93 +++++++++++++++++++++++++++++++-----
 3 files changed, 97 insertions(+), 13 deletions(-)
Comment 3 cvs-commit@gcc.gnu.org 2014-11-07 15:40:19 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "gdb and binutils".

The branch, master has been updated
       via  9de00a4aa026297eae42bafd8ab413cfc1a53e3a (commit)
      from  b7a084bebe979a4743540349025561ce82208843 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=9de00a4aa026297eae42bafd8ab413cfc1a53e3a

commit 9de00a4aa026297eae42bafd8ab413cfc1a53e3a
Author: Pedro Alves <palves@redhat.com>
Date:   Fri Nov 7 15:20:47 2014 +0000

    gdb.base/sigstep.exp: xfail gdb/17511 on i?86 Linux
    
    Running gdb.base/sigstep.exp with --target=i686-pc-linux-gnu on a
    64-bit kernel naturally trips on PR gdb/17511 as well, given this is a
    kernel bug.
    
    I haven't really tested a real 32-bit kernel/machine, but given the
    code in question in the kernel is shared between 32-bit and 64-bit,
    I'm quite sure the bug triggers in those cases as well.
    
    So, simply xfail i?86-*-linux* too.
    
    gdb/testsuite/
    2014-11-07  Pedro Alves  <palves@redhat.com>
    
    	PR gdb/17511
    	* gdb.base/sigstep.exp (in_handler_map) <si+advance>: xfail
    	i?86-*-linux*.

-----------------------------------------------------------------------

Summary of changes:
 gdb/testsuite/ChangeLog            |    6 ++++++
 gdb/testsuite/gdb.base/sigstep.exp |    1 +
 2 files changed, 7 insertions(+), 0 deletions(-)
Comment 4 cvs-commit@gcc.gnu.org 2014-12-25 00:46:11 UTC Comment hidden (spam)
Comment 5 Pedro Alves 2015-02-28 16:01:31 UTC
*** Bug 18063 has been marked as a duplicate of this bug. ***
Comment 6 Pedro Alves 2015-02-28 16:05:14 UTC
Kernel fix here:
  https://lkml.org/lkml/2014/11/3/740

The fix made it to the mm tree this week:
 http://marc.info/?l=linux-mm-commits&m=142499061914990&w=2
Comment 7 Pedro Alves 2016-08-10 22:32:43 UTC
The kernel fix made it to Linux 4.1.  Closing as fixed, since I had added a test for the scenario.