This is the mail archive of the gdb-patches@sourceware.org mailing list for the GDB project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: sig != GDB_SIGNAL_0 failed assertion stepping program on GNU/Linux


On 08/05/2015 02:04 AM, Joel Brobecker wrote:

> Ah ha, I missed the fact that the exception breakpoint is thread-
> specific. Your fix seems to be working very well; thanks for suggestion,
> Pedro! Attached is a patch with a slightly altered analysis as the revision
> log. Our SuSE 10 machine is very slow, so I tested it on a more modern
> machine with a slightly different distro.
> 
> I'm wondering if we shouldn't be doing the same for:
> 
>         case bp_longjmp_resume:
>         case bp_exception_resume:
>           this_action = BPSTAT_WHAT_CLEAR_LONGJMP_RESUME;
>           retval.is_longjmp = bptype == bp_longjmp_resume;
>           break;

Yep.

I've now written a testcase that triggers this, and tweaked the
commit log a bit further.

WDYT?
>From 2f1c5a8d90cfb71068d425ebdacb5b3108f13014 Mon Sep 17 00:00:00 2001
From: Pedro Alves <palves@redhat.com>
Date: Wed, 5 Aug 2015 10:32:49 +0100
Subject: [PATCH] stepping is disturbed by setjmp/longjmp | try/catch in other
 threads

At https://sourceware.org/ml/gdb-patches/2015-08/msg00097.html, Joel
observed that trying to next/step a program on GNU/Linux sometimes
results in the following failed assertion:

	% gdb -q .obj/gprof/main
    (gdb) start
    (gdb) n
    (gdb) step
    [...]/infrun.c:2391: internal-error:
    resume: Assertion `sig != GDB_SIGNAL_0' failed.

What happened is that, during the "next" operation, GDB hit a
longjmp/exception/step-resume breakpoint but failed to see that this
breakpoint was set for a different thread than the one being stepped.

Joel's detailed analysis follows:

More precisely, at the end of the "start" command, we are stopped at
the start of function Main in main.adb; there are 4 threads in total,
and we are in the main thread (which is thread 1):

    (gdb) info thread
      Id   Target Id         Frame
      4    Thread 0xb7a56ba0 (LWP 28379) 0xffffe410 in __kernel_vsyscall ()
      3    Thread 0xb7c5aba0 (LWP 28378) 0xffffe410 in __kernel_vsyscall ()
      2    Thread 0xb7e5eba0 (LWP 28377) 0xffffe410 in __kernel_vsyscall ()
    * 1    Thread 0xb7ea18c0 (LWP 28370) main () at /[...]/main.adb:57

All the logs below reference Thread ID/LWP, but it'll be easier to
talk about the threads by GDB thread number.  For instance, thread 1
is LWP 28370 while thread 3 is LWP 28378.  So, the explanations below
translate the LWPs into thread numbers.

Back to what happens while we are trying to "next' our program:
    (gdb) n
    infrun: clear_proceed_status_thread (Thread 0xb7a56ba0 (LWP 28379))
    infrun: clear_proceed_status_thread (Thread 0xb7c5aba0 (LWP 28378))
    infrun: clear_proceed_status_thread (Thread 0xb7e5eba0 (LWP 28377))
    infrun: clear_proceed_status_thread (Thread 0xb7ea18c0 (LWP 28370))
    infrun: proceed (addr=0xffffffff, signal=GDB_SIGNAL_DEFAULT)
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x805451e
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28370.0 [Thread 0xb7ea18c0 (LWP 28370)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x8054523

We've resumed thread 1 (LWP 28370), and received in return a signal
that the same thread stopped slightly further.  It's still in the
range of instructions for the line of source we started the "next"
from, as evidenced by the following trace...

    infrun: stepping inside range [0x805451e-0x8054531]

... and thus, we decide to continue stepping the same thread:

    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523
    infrun: prepare_to_wait

That's when we get an event from a different thread (thread 3)...

    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x80782d0
    infrun: context switch
    infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)

... which we find to be at the address where we set a breakpoint on
"the unwinder debug hook" (namely "_Unwind_DebugHook").  But GDB fails
to notice that the breakpoint was inserted for thread 1 only, and so
decides to handle it as...

    infrun: BPSTAT_WHAT_SET_LONGJMP_RESUME

... and inserts a breakpoint at the corresponding resume address, as
evidenced by this the next log:

    infrun: exception resume at 80542a2

That breakpoint seems innocent right now, but will play a role fairly
quickly.  But for now, GDB has inserted the exception-resume
breakpoint, and needs to single-step thread 3 past the breakpoint it
just hit.  Thus, it temporarily disables the exception breakpoint, and
requests a step of that thread:

    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: skipping breakpoint: stepping past insn at: 0x80782d0
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [Thread 0xb7c5aba0 (LWP 28378)] at 0x80782d0
    infrun: prepare_to_wait

We then get a notification, still from thread 3, that it's now past
that breakpoint...

    infrun: prepare_to_wait
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x8078424

... so we can resume what we were doing before, which is single-stepping
thread 1 until we get to a new line of code:

    infrun: switching back to stepped thread
    infrun: Switching context from Thread 0xb7c5aba0 (LWP 28378) to Thread 0xb7ea18c0 (LWP 28370)
    infrun: expected thread still hasn't advanced
    infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=0, current thread [Thread 0xb7ea18c0 (LWP 28370)] at 0x8054523

The "resume" log above shows that we're resuming thread 1 from where
we left off (0x8054523).  We get one more stop at 0x8054529, which is
still inside our stepping range so we go again.  That's when we get
the following event, from thread 3:

    infrun: prepare_to_wait
    infrun: target_wait (-1.0.0, status) =
    infrun:   28370.28378.0 [Thread 0xb7c5aba0 (LWP 28378)],
    infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
    infrun: TARGET_WAITKIND_STOPPED
    infrun: stop_pc = 0x80542a2

Now the stop_pc address is interesting, because it's the address of
"exception resume" breakpoint...

    infrun: context switch
    infrun: Switching context from Thread 0xb7ea18c0 (LWP 28370) to Thread 0xb7c5aba0 (LWP 28378)
    infrun: BPSTAT_WHAT_CLEAR_LONGJMP_RESUME

... and since that location is at a different line of code, this is
where it decides the "next" operation should stop:

    infrun: stop_waiting
    [Switching to Thread 0xb7c5aba0 (LWP 28378)]
    0x080542a2 in inte_tache_rt.ttache_rt (
        <_task>=0x80968ec <inte_tache_rt_inst.tache2>)
        at /[...]/inte_tache_rt.adb:54
    54            end loop;

However, what GDB should have noticed earlier that the exception
breakpoint we hit was for a different thread, thus should have
single-stepped that thread out of the breakpoint _without_ inserting
the exception-return breakpoint, and then resumed the single-stepping
of the initial thread (thread 1) until that thread stepped out of its
stepping range.

This is what this patch does, and after applying it, GDB now correctly
stops on the next line of code.

The patch adds a C++ test that exercises this, both for setjmp/longjmp
and exception breakpoints.  With an unpatched GDB it shows:

 (gdb) next
 [Switching to Thread 22445.22455]
 thread_try_catch (arg=0x0) at /home/pedro/gdb/mygit/build/../src/gdb/testsuite/gdb.threads/next-other-thr-longjmp.c:59
 59            catch (...)
 (gdb) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 1
 next
 /home/pedro/gdb/mygit/build/../src/gdb/infrun.c:4865: internal-error: process_event_stop_test: Assertion `ecs->event_thread->control.exception_resume_breakpoint != NULL' fa
 iled.
 A problem internal to GDB has been detected,
 further debugging may prove unreliable.
 Quit this debugging session? (y or n) FAIL: gdb.threads/next-other-thr-longjmp.exp: next to line 2 (GDB internal error)
 Resyncing due to internal error.
 n

Tested on x86_64-linux, no regressions.

gdb/ChangeLog:
2015-08-05  Pedro Alves  <palves@redhat.com>
	    Joel Brobecker  <brobecker@adacore.com>

        * breakpoint.c (bpstat_what) <bp_longjmp, bp_longjmp_call_dummy>
	<bp_exception, bp_longjmp_resume, bp_exception_resume>: Handle the
	case where BS->STOP is not set.

gdb/testsuite/ChangeLog:
2015-08-05  Pedro Alves  <palves@redhat.com>

	* gdb.threads/next-other-thr-longjmp.c: New file.
	* gdb.threads/next-other-thr-longjmp.exp: New file.
---
 gdb/breakpoint.c                                   |  18 +++-
 .../gdb.threads/next-while-other-thread-longjmps.c | 116 +++++++++++++++++++++
 .../next-while-other-thread-longjmps.exp           |  40 +++++++
 3 files changed, 170 insertions(+), 4 deletions(-)
 create mode 100644 gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.c
 create mode 100644 gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.exp

diff --git a/gdb/breakpoint.c b/gdb/breakpoint.c
index 78a694e..125b22f 100644
--- a/gdb/breakpoint.c
+++ b/gdb/breakpoint.c
@@ -5752,13 +5752,23 @@ bpstat_what (bpstat bs_head)
 	case bp_longjmp:
 	case bp_longjmp_call_dummy:
 	case bp_exception:
-	  this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME;
-	  retval.is_longjmp = bptype != bp_exception;
+	  if (bs->stop)
+	    {
+	      this_action = BPSTAT_WHAT_SET_LONGJMP_RESUME;
+	      retval.is_longjmp = bptype != bp_exception;
+	    }
+	  else
+	    this_action = BPSTAT_WHAT_SINGLE;
 	  break;
 	case bp_longjmp_resume:
 	case bp_exception_resume:
-	  this_action = BPSTAT_WHAT_CLEAR_LONGJMP_RESUME;
-	  retval.is_longjmp = bptype == bp_longjmp_resume;
+	  if (bs->stop)
+	    {
+	      this_action = BPSTAT_WHAT_CLEAR_LONGJMP_RESUME;
+	      retval.is_longjmp = bptype == bp_longjmp_resume;
+	    }
+	  else
+	    this_action = BPSTAT_WHAT_SINGLE;
 	  break;
 	case bp_step_resume:
 	  if (bs->stop)
diff --git a/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.c b/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.c
new file mode 100644
index 0000000..b786430
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.c
@@ -0,0 +1,116 @@
+/* This testcase is part of GDB, the GNU debugger.
+
+   Copyright 2015 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include <assert.h>
+#include <pthread.h>
+#include <unistd.h>
+#include <setjmp.h>
+
+/* Number of threads.  */
+#define NTHREADS 10
+
+/* When set, threads exit.  */
+volatile int break_out;
+
+/* Entry point for threads that setjmp/longjmp.  */
+
+static void *
+thread_longjmp (void *arg)
+{
+  jmp_buf env;
+
+  while (!break_out)
+    {
+      if (setjmp (env) == 0)
+	longjmp (env, 1);
+
+      usleep (1);
+    }
+  return NULL;
+}
+
+/* Entry point for threads that try/catch.  */
+
+static void *
+thread_try_catch (void *arg)
+{
+  volatile unsigned int counter = 0;
+
+  while (!break_out)
+    {
+      try
+	{
+	  throw 1;
+	}
+      catch (...)
+	{
+	  counter++;
+	}
+
+      usleep (1);
+    }
+  return NULL;
+}
+
+int
+main (void)
+{
+  pthread_t threads[NTHREADS];
+  int i;
+  int ret;
+
+  /* Don't run forever.  */
+  alarm (180);
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      /* Half the threads does setjmp/longjmp, the other half does
+	 try/catch.  */
+      if ((i % 2) == 0)
+	ret = pthread_create (&threads[i], NULL, thread_longjmp , NULL);
+      else
+	ret = pthread_create (&threads[i], NULL, thread_try_catch , NULL);
+      assert (ret == 0);
+    }
+
+#define LINE usleep (1)
+
+  /* The other thread's setjmp/longjmp/try/catch should not disturb
+     this thread's stepping over these lines.  */
+
+  LINE; /* set break here */
+  LINE; /* line 1 */
+  LINE; /* line 2 */
+  LINE; /* line 3 */
+  LINE; /* line 4 */
+  LINE; /* line 5 */
+  LINE; /* line 6 */
+  LINE; /* line 7 */
+  LINE; /* line 8 */
+  LINE; /* line 9 */
+  LINE; /* line 10 */
+
+  break_out = 1;
+
+  for (i = 0; i < NTHREADS; i++)
+    {
+      ret = pthread_join (threads[i], NULL);
+      assert (ret == 0);
+    }
+
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.exp b/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.exp
new file mode 100644
index 0000000..72a0617
--- /dev/null
+++ b/gdb/testsuite/gdb.threads/next-while-other-thread-longjmps.exp
@@ -0,0 +1,40 @@
+# Copyright (C) 2015 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This test has the main thread step over a few lines, while a few
+# threads constantly do setjmp/long and others do try/catch.  The
+# "next" commands in the main thread should be able to complete
+# undisturbed.
+
+standard_testfile
+
+set linenum [gdb_get_line_number "set break here"]
+
+if {[prepare_for_testing "failed to prepare" \
+	 $testfile $srcfile {c++ debug pthreads}] == -1} {
+    return -1
+}
+
+if ![runto_main] then {
+    fail "Can't run to main"
+    return 0
+}
+
+gdb_breakpoint $linenum
+gdb_continue_to_breakpoint "start line"
+
+for {set i 1} {$i <= 10} {incr i} {
+    gdb_test "next" " line $i .*" "next to line $i"
+}
-- 
1.9.3


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]