Bug 19061 - gdb-7.10 hangs/spins-on-cpu when debugging any program on Alpha
Summary: gdb-7.10 hangs/spins-on-cpu when debugging any program on Alpha
Status: RESOLVED FIXED
Alias: None
Product: gdb
Classification: Unclassified
Component: gdb (show other bugs)
Version: 7.10
: P2 normal
Target Milestone: 8.1
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-10-03 15:13 UTC by Tobias Klausmann
Modified: 2018-01-03 16:05 UTC (History)
4 users (show)

See Also:
Host: alpha-linux-gnu
Target:
Build:
Last reconfirmed:


Attachments
proposed patch (874 bytes, patch)
2016-10-26 17:10 UTC, Richard Henderson
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Tobias Klausmann 2015-10-03 15:13:39 UTC
7.9.1 works fine:

$ gdb /bin/ls
GNU gdb (Gentoo 7.9.1 vanilla) 7.9.1
[...]
This GDB was configured as "alpha-unknown-linux-gnu".
Reading symbols from /bin/ls...(no debugging symbols found)...done.
(gdb) run
Starting program: /bin/ls 
4405_alpha-sysctl-uac.patch      gcc-4.6.3.tbz2            secfix.patch
[...]
[Inferior 1 (process 27877) exited normally]
(gdb) quit

7.10:
gdb-7.10 $ gdb/gdb /bin/ls
GNU gdb (7.10-vanilla) 7.10
This GDB was configured as "alpha-unknown-linux-gnu".
Reading symbols from /bin/ls...(no debugging symbols found)...done.
(gdb) run
Starting program: /bin/ls 
[gdb hangs here, eating close to 100% of one CPU. straceing it yields nothing immediately obvious.]

So I bisected this from the tagged gdb-7.xy-release commits, using the example above as a test case. I found was this commit to be the first bad one:

faf09f0119da40d9b408021ad5665a906e00ee59 is the first bad commit
commit faf09f0119da40d9b408021ad5665a906e00ee59
Author: Pedro Alves <palves@redhat.com>
Date:   Wed Mar 4 20:41:16 2015 +0000

    Linux native: Use TRAP_BRKPT/TRAP_HWBPT
    
    This patch adjusts the native Linux target backend to tell the core
    whether a trap was caused by a breakpoint.
    
    It teaches the target to get that information out of the si_code of
    the SIGTRAP siginfo.
    
    Tested on x86-64 Fedora 20, s390 RHEL 7, and PPC64 Fedora 18.  An
    earlier version was tested on ARM Fedora 21.
    
    gdb/ChangeLog:
    2015-03-04  Pedro Alves  <palves@redhat.com>
    
        * linux-nat.c (save_sigtrap): Check for breakpoints before
        checking watchpoints.
        (status_callback) [USE_SIGTRAP_SIGINFO]: Don't check whether a
        breakpoint is inserted if relying on SIGTRAP's siginfo.si_code.
        (check_stopped_by_breakpoint) [USE_SIGTRAP_SIGINFO]: Decide whether
        a breakpoint triggered based on the SIGTRAP's siginfo.si_code.
        (linux_nat_stopped_by_sw_breakpoint)
        (linux_nat_supports_stopped_by_sw_breakpoint)
        (linux_nat_stopped_by_hw_breakpoint)
        (linux_nat_supports_stopped_by_hw_breakpoint): New functions.
        (linux_nat_wait_1): Don't re-increment the PC if relying on
        SIGTRAP's siginfo->si_code.
        (linux_nat_add_target): Install new target methods.
        * linux-thread-db.c (check_event): Don't account for breakpoint PC
        offset if the target already adjusted the PC.
        * nat/linux-ptrace.h (USE_SIGTRAP_SIGINFO): New.
        (GDB_ARCH_TRAP_BRKPT): New.
        (TRAP_HWBKPT): Define if not already defined.

:040000 040000 5623811c697afc352ad61c2090773eacf8b4fbbc 206b4d6974c8a35cbada87c5c9555e8558a798e4 M      gdb


I am unsure how to dig out the actual problem from this, but I can provide access to an Alpha machine if need be.
Comment 1 Matt Turner 2016-05-14 23:07:16 UTC
Pedro, do you expect you're going to look at this bug sometime soon?
Comment 2 Pedro Alves 2016-05-16 08:53:55 UTC
I won't be able to look at this myself in the near future, sorry.

The comments in the commit itself should provide hints:

 https://sourceware.org/ml/gdb-patches/2015-02/msg00731.html

as well as the series intro:

 https://sourceware.org/ml/gdb-patches/2015-02/msg00726.html


See also these later fixes for MIPS:

 https://sourceware.org/ml/gdb-patches/2016-02/msg00734.html

 https://sourceware.org/ml/gdb-patches/2016-02/msg00762.html

basically, the MIPS kernel didn't report si_code correctly.

Likely Alpha has some similar issue.

"set debug infrun 1" and "set debug lin-lwp 1" will likely show what kind of loop gdb is stuck in.
Comment 3 Tobias Klausmann 2016-05-17 10:29:19 UTC
This is a somewhat representative slice of a spinning gdb run:

infrun: prepare_to_wait
linux_nat_wait: [process -1], [TARGET_WNOHANG]
RSRL: NOT resuming LWP process 2636, not stopped
LLW: enter
LNW: waitpid(-1, ...) returned 2636, ERRNO-OK
LLW: waitpid 2636 received Trace/breakpoint trap (stopped)
CSBB: process 2636 stopped by software breakpoint
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 2636, has pending status
LLW: trap ptid is process 2636.
LLW: exit
infrun: target_wait (-1.0.0, status) =
infrun:   2636.2636.0 [process 2636],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x20000011d20
infrun: BPSTAT_WHAT_SINGLE
infrun: no stepping, continue
infrun: stop_all_threads
infrun: stop_all_threads, pass=0, iterations=0
infrun:   process 2636 not executing
infrun: stop_all_threads, pass=1, iterations=1
infrun:   process 2636 not executing
infrun: stop_all_threads done
infrun: skipping breakpoint: stepping past insn at: 0x20000011d20
infrun: skipping breakpoint: stepping past insn at: 0x20000011d20
infrun: skipping breakpoint: stepping past insn at: 0x20000011d20
infrun: resume (step=1, signal=GDB_SIGNAL_0), trap_expected=1, current thread [process 2636] at 0x20000011d20
LLR: Preparing to step process 2636, 0, inferior_ptid process 2636
LLR: PTRACE_SINGLESTEP process 2636, 0 (resume event thread)
sigchld
infrun: prepare_to_wait
linux_nat_wait: [process -1], [TARGET_WNOHANG]
RSRL: NOT resuming LWP process 2636, not stopped
LLW: enter
LNW: waitpid(-1, ...) returned 2636, ERRNO-OK
LLW: waitpid 2636 received Trace/breakpoint trap (stopped)
CSBB: process 2636 stopped by software breakpoint
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 2636, has pending status
LLW: trap ptid is process 2636.
LLW: exit
infrun: target_wait (-1.0.0, status) =
infrun:   2636.2636.0 [process 2636],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: clear_step_over_info
infrun: restart threads: [process 2636] is event thread
infrun: stop_pc = 0x20000003428
infrun: delayed software breakpoint trap, ignoring
infrun: no stepping, continue
infrun: resume (step=0, signal=GDB_SIGNAL_0), trap_expected=0, current thread [process 2636] at 0x20000003428
LLR: Preparing to resume process 2636, 0, inferior_ptid process 2636
LLR: PTRACE_CONT process 2636, 0 (resume event thread)
sigchld
infrun: prepare_to_wait
linux_nat_wait: [process -1], [TARGET_WNOHANG]
RSRL: NOT resuming LWP process 2636, not stopped
LLW: enter
LNW: waitpid(-1, ...) returned 2636, ERRNO-OK
LLW: waitpid 2636 received Trace/breakpoint trap (stopped)
CSBB: process 2636 stopped by software breakpoint
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 2636, has pending status
LLW: trap ptid is process 2636.
LLW: exit
infrun: target_wait (-1.0.0, status) =
infrun:   2636.2636.0 [process 2636],
infrun:   status->kind = stopped, signal = GDB_SIGNAL_TRAP
infrun: TARGET_WAITKIND_STOPPED
infrun: stop_pc = 0x20000011d20
Comment 4 Pedro Alves 2016-05-17 16:41:25 UTC
We see:

 LLR: Preparing to step process 2636, 0, inferior_ptid process 2636
 LLR: PTRACE_SINGLESTEP process 2636, 0 (resume event thread)
 ...
 LLW: waitpid 2636 received Trace/breakpoint trap (stopped)
 CSBB: process 2636 stopped by software breakpoint
 ...
 infrun: stop_pc = 0x20000003428
 infrun: delayed software breakpoint trap, ignoring

And Alpha is a decr_pc_after_break arch:

 $ grep decr_pc_after *  | grep alpha
 alpha-tdep.c:  set_gdbarch_decr_pc_after_break (gdbarch, ALPHA_INSN_SIZE);

If GDB got that "stopped by software breakpoint" stop reason wrong, it will decrement the thread's PC by ALPHA_INSN_SIZE when it shouldn't.  Then the next time the thread is resumed, it executes the wrong instruction, in turn manifesting in odd things like odd loops and crashes.

See linux-nat.c:save_stop_reason for where the PC is decremented.
Comment 5 Richard Henderson 2016-10-26 17:10:30 UTC
Created attachment 9593 [details]
proposed patch

I tried with just the nat/linux-ptrace.h hunk, but that wasn't enough
to fix the problem.  We could fix the kernel side to distinguish, but
that gets into version requirements.  It might just be better not to
rely on the kernel for singlestep support at all.

Note that the base alpha_software_single_step should have been using
alpha_deal_with_atomic_sequence, so that we don't get into a different
sort of infinite loop.
Comment 6 Tobias Klausmann 2016-11-02 13:33:36 UTC
(In reply to Richard Henderson from comment #5)
> Created attachment 9593 [details]
> proposed patch
> 
> I tried with just the nat/linux-ptrace.h hunk, but that wasn't enough
> to fix the problem.  We could fix the kernel side to distinguish, but
> that gets into version requirements.  It might just be better not to
> rely on the kernel for singlestep support at all.
> 
> Note that the base alpha_software_single_step should have been using
> alpha_deal_with_atomic_sequence, so that we don't get into a different
> sort of infinite loop.

I applied this patch to 7.12 on our dev alpha, but it now hangs:


(gdb) set debug infrun 1
(gdb) set debug lin-lwp 1
(gdb) run
Starting program: /bin/ls
linux_nat_wait: [process 8646], []
LLW: enter
LNW: waitpid(-1, ...) returned 0, ERRNO-OK
RSRL: NOT resuming LWP process 8646, not stopped
LNW: about to sigsuspend
[no more output]
Comment 7 Tobias Klausmann 2017-11-29 09:44:51 UTC
I *think* this may be fixed in gdb-8. I just checked out gdb-8.0.1-release from git and I have been unable to reproduce the hang. I will keep using it for everyday stuff and report back if anything breaks.
Comment 8 Tobias Klausmann 2017-12-14 12:01:33 UTC
(In reply to Tobias Klausmann from comment #7)
> I *think* this may be fixed in gdb-8. I just checked out gdb-8.0.1-release
> from git and I have been unable to reproduce the hang. I will keep using it
> for everyday stuff and report back if anything breaks.

I spoke too soon: I can now reliably make it hang on commit gdb-8.0.1-release aka 2dcf9205c32aa69c102640962ff03746d04c02cc using ls or a trivial hello world program.
Comment 9 Sourceware Commits 2017-12-15 18:21:53 UTC
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=68f81d60196eb201b209873cf53258f13b0046b9

commit 68f81d60196eb201b209873cf53258f13b0046b9
Author: Richard Henderson <rth@redhat.com>
Date:   Fri Dec 15 18:19:42 2017 +0000

    Fix PR19061, gdb hangs/spins-on-cpu when debugging any program on Alpha
    
    This fixes PR19061, where gdb hangs/spins-on-cpu when debugging any
    program on Alpha.
    
    (This patch is Uros' forward port of the patch from comment #5
    of the PR [1].)
    
    Patch was tested on alphaev68-linux-gnu, also tested with gcc's
    testsuite, where it fixed all hangs in guality.exp and
    simulate-thread.exp testcases.
    
    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19061#c5
    
    gdb/ChangeLog:
    2017-12-15  Richard Henderson  <rth@redhat.com>
    	    Uros Bizjak  <ubizjak@gmail.com>
    
    	PR gdb/19061
    	* alpha-tdep.c (alpha_software_single_step): Call
    	alpha_deal_with_atomic_sequence here.
    	(set_gdbarch_software_single_step): Set to
    	alpha_software_single_step.
    	* nat/linux-ptrace.h [__alpha__]: Define GDB_ARCH_IS_TRAP_BRKPT
    	and GDB_ARCH_IS_TRAP_HWBKPT.
Comment 10 Sourceware Commits 2018-01-03 15:21:10 UTC
The gdb-8.0-branch branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=bd496067387a9c89a7e62bbba76e784634936932

commit bd496067387a9c89a7e62bbba76e784634936932
Author: Richard Henderson <rth@redhat.com>
Date:   Wed Jan 3 15:14:12 2018 +0000

    Fix PR19061, gdb hangs/spins-on-cpu when debugging any program on Alpha
    
    This fixes PR19061, where gdb hangs/spins-on-cpu when debugging any
    program on Alpha.
    
    (This patch is Uros' forward port of the patch from comment #5
    of the PR [1].)
    
    Patch was tested on alphaev68-linux-gnu, also tested with gcc's
    testsuite, where it fixed all hangs in guality.exp and
    simulate-thread.exp testcases.
    
    [1] https://sourceware.org/bugzilla/show_bug.cgi?id=19061#c5
    
    gdb/ChangeLog:
    2018-01-03  Richard Henderson  <rth@redhat.com>
    	    Uros Bizjak  <ubizjak@gmail.com>
    
    	PR gdb/19061
    	* alpha-tdep.c (alpha_deal_with_atomic_sequence): Change
    	prototype.
    	(alpha_software_single_step): Call alpha_deal_with_atomic_sequence
    	here.
    	(set_gdbarch_software_single_step): Set to
    	alpha_software_single_step.
    	* nat/linux-ptrace.h [__alpha__]: Define GDB_ARCH_IS_TRAP_BRKPT
    	and GDB_ARCH_IS_TRAP_HWBKPT.
Comment 11 Pedro Alves 2018-01-03 16:05:21 UTC
Closing.