7.9.1 works fine: $ gdb /bin/ls GNU gdb (Gentoo 7.9.1 vanilla) 7.9.1 [...] This GDB was configured as "alpha-unknown-linux-gnu". Reading symbols from /bin/ls...(no debugging symbols found)...done. (gdb) run Starting program: /bin/ls 4405_alpha-sysctl-uac.patch gcc-4.6.3.tbz2 secfix.patch [...] [Inferior 1 (process 27877) exited normally] (gdb) quit 7.10: gdb-7.10 $ gdb/gdb /bin/ls GNU gdb (7.10-vanilla) 7.10 This GDB was configured as "alpha-unknown-linux-gnu". Reading symbols from /bin/ls...(no debugging symbols found)...done. (gdb) run Starting program: /bin/ls [gdb hangs here, eating close to 100% of one CPU. straceing it yields nothing immediately obvious.] So I bisected this from the tagged gdb-7.xy-release commits, using the example above as a test case. I found was this commit to be the first bad one: faf09f0119da40d9b408021ad5665a906e00ee59 is the first bad commit commit faf09f0119da40d9b408021ad5665a906e00ee59 Author: Pedro Alves <palves@redhat.com> Date: Wed Mar 4 20:41:16 2015 +0000 Linux native: Use TRAP_BRKPT/TRAP_HWBPT This patch adjusts the native Linux target backend to tell the core whether a trap was caused by a breakpoint. It teaches the target to get that information out of the si_code of the SIGTRAP siginfo. Tested on x86-64 Fedora 20, s390 RHEL 7, and PPC64 Fedora 18. An earlier version was tested on ARM Fedora 21. gdb/ChangeLog: 2015-03-04 Pedro Alves <palves@redhat.com> * linux-nat.c (save_sigtrap): Check for breakpoints before checking watchpoints. (status_callback) [USE_SIGTRAP_SIGINFO]: Don't check whether a breakpoint is inserted if relying on SIGTRAP's siginfo.si_code. (check_stopped_by_breakpoint) [USE_SIGTRAP_SIGINFO]: Decide whether a breakpoint triggered based on the SIGTRAP's siginfo.si_code. (linux_nat_stopped_by_sw_breakpoint) (linux_nat_supports_stopped_by_sw_breakpoint) (linux_nat_stopped_by_hw_breakpoint) (linux_nat_supports_stopped_by_hw_breakpoint): New functions. (linux_nat_wait_1): Don't re-increment the PC if relying on SIGTRAP's siginfo->si_code. (linux_nat_add_target): Install new target methods. * linux-thread-db.c (check_event): Don't account for breakpoint PC offset if the target already adjusted the PC. * nat/linux-ptrace.h (USE_SIGTRAP_SIGINFO): New. (GDB_ARCH_TRAP_BRKPT): New. (TRAP_HWBKPT): Define if not already defined. :040000 040000 5623811c697afc352ad61c2090773eacf8b4fbbc 206b4d6974c8a35cbada87c5c9555e8558a798e4 M gdb I am unsure how to dig out the actual problem from this, but I can provide access to an Alpha machine if need be.
Pedro, do you expect you're going to look at this bug sometime soon?
I won't be able to look at this myself in the near future, sorry. The comments in the commit itself should provide hints: https://sourceware.org/ml/gdb-patches/2015-02/msg00731.html as well as the series intro: https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html See also these later fixes for MIPS: https://sourceware.org/ml/gdb-patches/2016-02/msg00734.html https://sourceware.org/ml/gdb-patches/2016-02/msg00762.html basically, the MIPS kernel didn't report si_code correctly. Likely Alpha has some similar issue. "set debug infrun 1" and "set debug lin-lwp 1" will likely show what kind of loop gdb is stuck in.
This is a somewhat representative slice of a spinning gdb run: infrun: prepare_to_wait linux_nat_wait: [process -1], [TARGET_WNOHANG] RSRL: NOT resuming LWP process 2636, not stopped LLW: enter LNW: waitpid(-1, ...) returned 2636, ERRNO-OK LLW: waitpid 2636 received Trace/breakpoint trap (stopped) CSBB: process 2636 stopped by software breakpoint LNW: waitpid(-1, ...) returned 0, ERRNO-OK RSRL: NOT resuming LWP process 2636, has pending status LLW: trap ptid is process 2636. LLW: exit infrun: target_wait (-1.0.0, status) = infrun: 2636.2636.0 [process 2636], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x20000011d20 infrun: BPSTAT_WHAT_SINGLE infrun: no stepping, continue infrun: stop_all_threads infrun: stop_all_threads, pass=0, iterations=0 infrun: process 2636 not executing infrun: stop_all_threads, pass=1, iterations=1 infrun: process 2636 not executing infrun: stop_all_threads done infrun: skipping breakpoint: stepping past insn at: 0x20000011d20 infrun: skipping breakpoint: stepping past insn at: 0x20000011d20 infrun: skipping breakpoint: stepping past insn at: 0x20000011d20 infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [process 2636] at 0x20000011d20 LLR: Preparing to step process 2636, 0, inferior_ptid process 2636 LLR: PTRACE_SINGLESTEP process 2636, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [TARGET_WNOHANG] RSRL: NOT resuming LWP process 2636, not stopped LLW: enter LNW: waitpid(-1, ...) returned 2636, ERRNO-OK LLW: waitpid 2636 received Trace/breakpoint trap (stopped) CSBB: process 2636 stopped by software breakpoint LNW: waitpid(-1, ...) returned 0, ERRNO-OK RSRL: NOT resuming LWP process 2636, has pending status LLW: trap ptid is process 2636. LLW: exit infrun: target_wait (-1.0.0, status) = infrun: 2636.2636.0 [process 2636], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: clear_step_over_info infrun: restart threads: [process 2636] is event thread infrun: stop_pc = 0x20000003428 infrun: delayed software breakpoint trap, ignoring infrun: no stepping, continue infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [process 2636] at 0x20000003428 LLR: Preparing to resume process 2636, 0, inferior_ptid process 2636 LLR: PTRACE_CONT process 2636, 0 (resume event thread) sigchld infrun: prepare_to_wait linux_nat_wait: [process -1], [TARGET_WNOHANG] RSRL: NOT resuming LWP process 2636, not stopped LLW: enter LNW: waitpid(-1, ...) returned 2636, ERRNO-OK LLW: waitpid 2636 received Trace/breakpoint trap (stopped) CSBB: process 2636 stopped by software breakpoint LNW: waitpid(-1, ...) returned 0, ERRNO-OK RSRL: NOT resuming LWP process 2636, has pending status LLW: trap ptid is process 2636. LLW: exit infrun: target_wait (-1.0.0, status) = infrun: 2636.2636.0 [process 2636], infrun: status->kind = stopped, signal = GDB_SIGNAL_TRAP infrun: TARGET_WAITKIND_STOPPED infrun: stop_pc = 0x20000011d20
We see: LLR: Preparing to step process 2636, 0, inferior_ptid process 2636 LLR: PTRACE_SINGLESTEP process 2636, 0 (resume event thread) ... LLW: waitpid 2636 received Trace/breakpoint trap (stopped) CSBB: process 2636 stopped by software breakpoint ... infrun: stop_pc = 0x20000003428 infrun: delayed software breakpoint trap, ignoring And Alpha is a decr_pc_after_break arch: $ grep decr_pc_after * | grep alpha alpha-tdep.c: set_gdbarch_decr_pc_after_break (gdbarch, ALPHA_INSN_SIZE); If GDB got that "stopped by software breakpoint" stop reason wrong, it will decrement the thread's PC by ALPHA_INSN_SIZE when it shouldn't. Then the next time the thread is resumed, it executes the wrong instruction, in turn manifesting in odd things like odd loops and crashes. See linux-nat.c:save_stop_reason for where the PC is decremented.
Created attachment 9593 [details] proposed patch I tried with just the nat/linux-ptrace.h hunk, but that wasn't enough to fix the problem. We could fix the kernel side to distinguish, but that gets into version requirements. It might just be better not to rely on the kernel for singlestep support at all. Note that the base alpha_software_single_step should have been using alpha_deal_with_atomic_sequence, so that we don't get into a different sort of infinite loop.
(In reply to Richard Henderson from comment #5) > Created attachment 9593 [details] > proposed patch > > I tried with just the nat/linux-ptrace.h hunk, but that wasn't enough > to fix the problem. We could fix the kernel side to distinguish, but > that gets into version requirements. It might just be better not to > rely on the kernel for singlestep support at all. > > Note that the base alpha_software_single_step should have been using > alpha_deal_with_atomic_sequence, so that we don't get into a different > sort of infinite loop. I applied this patch to 7.12 on our dev alpha, but it now hangs: (gdb) set debug infrun 1 (gdb) set debug lin-lwp 1 (gdb) run Starting program: /bin/ls linux_nat_wait: [process 8646], [] LLW: enter LNW: waitpid(-1, ...) returned 0, ERRNO-OK RSRL: NOT resuming LWP process 8646, not stopped LNW: about to sigsuspend [no more output]
I *think* this may be fixed in gdb-8. I just checked out gdb-8.0.1-release from git and I have been unable to reproduce the hang. I will keep using it for everyday stuff and report back if anything breaks.
(In reply to Tobias Klausmann from comment #7) > I *think* this may be fixed in gdb-8. I just checked out gdb-8.0.1-release > from git and I have been unable to reproduce the hang. I will keep using it > for everyday stuff and report back if anything breaks. I spoke too soon: I can now reliably make it hang on commit gdb-8.0.1-release aka 2dcf9205c32aa69c102640962ff03746d04c02cc using ls or a trivial hello world program.
The master branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=68f81d60196eb201b209873cf53258f13b0046b9 commit 68f81d60196eb201b209873cf53258f13b0046b9 Author: Richard Henderson <rth@redhat.com> Date: Fri Dec 15 18:19:42 2017 +0000 Fix PR19061, gdb hangs/spins-on-cpu when debugging any program on Alpha This fixes PR19061, where gdb hangs/spins-on-cpu when debugging any program on Alpha. (This patch is Uros' forward port of the patch from comment #5 of the PR [1].) Patch was tested on alphaev68-linux-gnu, also tested with gcc's testsuite, where it fixed all hangs in guality.exp and simulate-thread.exp testcases. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19061#c5 gdb/ChangeLog: 2017-12-15 Richard Henderson <rth@redhat.com> Uros Bizjak <ubizjak@gmail.com> PR gdb/19061 * alpha-tdep.c (alpha_software_single_step): Call alpha_deal_with_atomic_sequence here. (set_gdbarch_software_single_step): Set to alpha_software_single_step. * nat/linux-ptrace.h [__alpha__]: Define GDB_ARCH_IS_TRAP_BRKPT and GDB_ARCH_IS_TRAP_HWBKPT.
The gdb-8.0-branch branch has been updated by Pedro Alves <palves@sourceware.org>: https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd496067387a9c89a7e62bbba76e784634936932 commit bd496067387a9c89a7e62bbba76e784634936932 Author: Richard Henderson <rth@redhat.com> Date: Wed Jan 3 15:14:12 2018 +0000 Fix PR19061, gdb hangs/spins-on-cpu when debugging any program on Alpha This fixes PR19061, where gdb hangs/spins-on-cpu when debugging any program on Alpha. (This patch is Uros' forward port of the patch from comment #5 of the PR [1].) Patch was tested on alphaev68-linux-gnu, also tested with gcc's testsuite, where it fixed all hangs in guality.exp and simulate-thread.exp testcases. [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19061#c5 gdb/ChangeLog: 2018-01-03 Richard Henderson <rth@redhat.com> Uros Bizjak <ubizjak@gmail.com> PR gdb/19061 * alpha-tdep.c (alpha_deal_with_atomic_sequence): Change prototype. (alpha_software_single_step): Call alpha_deal_with_atomic_sequence here. (set_gdbarch_software_single_step): Set to alpha_software_single_step. * nat/linux-ptrace.h [__alpha__]: Define GDB_ARCH_IS_TRAP_BRKPT and GDB_ARCH_IS_TRAP_HWBKPT.
Closing.