Bug 13463

Summary: bad multi-inferior behavior with breakpoint resetting
Product: gdb Reporter: Tom Tromey <tromey>
Component: breakpointsAssignee: Not yet assigned to anyone <unassigned>
Status: RESOLVED FIXED    
Severity: normal CC: pedro, simon.marchi, tromey
Priority: P2    
Version: HEAD   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:

Description Tom Tromey 2011-12-01 18:49:43 UTC
I tried to use gdb to debug gdb in a multi-inferior case,
by running the test suite.  To reproduce:

* cd $build/gdb/testsuite
* gdb /usr/bin/make
* Set up for multi-inferior.  I use at least:
  set detach-on-fork off
  set schedule-multiple on
  set pagination off
  set target-async on
  set non-stop on
  set print inferior-events on
* add-inferior -exec ../gdb
* inferior 2
* b internal_error
* run check RUNTESTFLAGS=py-mi.exp

Now a couple of strange things happen.

First, I see some warnings like this:
process 15935 is executing new program: /usr/bin/iconv
Error in re-setting breakpoint 1: Cannot access memory at address 0x494f22
Error in re-setting breakpoint 1: Cannot access memory at address 0x494f22
Error in re-setting breakpoint 1: Cannot access memory at address 0x494f22
Error in re-setting breakpoint 1: Cannot access memory at address 0x494f22
[Inferior 33 (process 15935) exited normally]

Second, I get this sometimes:

Cannot find new threads: capability not available

(This error would be improved by printing the inferior information)
Comment 1 Tom Tromey 2011-12-01 18:52:12 UTC
Oops, you must switch to inferior 1 before running the inferior.
I did this in my test, I just forgot to document it here.
Comment 2 Pedro Alves 2012-10-24 16:51:14 UTC
I thought this had been fixed, but nope, still reproducible with current head.
Comment 3 Simon Marchi 2013-08-05 19:10:58 UTC
I think I have an even simpler test case that triggers this bug.

---8<---
#include <stdio.h>
#include <unistd.h>
#include <string.h>

void function_in_a() {
	printf("Hi, I'm a\n");
}

int main(int argc, char **argv) {
	for (;;) {
		function_in_a();
		sleep(1);
	}
	return 0;
}
--->8---

What I do:

$ gdb
(gdb) set non-stop on
(gdb) set target-async on
(gdb) file a <!-- the code above, compiled with no special flags) -->
(gdb) r &
(gdb) b function_in_a

I get the same kind of error: Cannot access memory at address 0x400504
From what I understand, adding this breakpoint requires to read the running inferior's memory to analyze the function prologue. GDB tries to read the memory while the ptrace state of the inferior is "running", which isn't allowed.
Comment 4 Simon Marchi 2013-08-05 19:59:05 UTC
Some additional info:

backtrace when the error is thrown: http://pastebin.com/raw.php?i=6FyXvnNc
backtrace when the ptrace memory read fails (the source cause of the error): http://pastebin.com/raw.php?i=uxwcnjiN
Comment 5 Simon Marchi 2013-08-05 21:39:44 UTC
Even when avoiding the prologue analysis process it doesn't work. If I try adding the breakpoint using the function address instead of its name, it fails during the actual breakpoint insertion. So it seems that even the simplest case of adding a breakpoint to a running inferior in async/non-stop is not handled.
Comment 6 Tom Tromey 2017-08-19 18:09:14 UTC
This bug seems to depend on non-stop.
When I apply the patches from bug 19471, and just
use "detach-on-fork off" and "schedule-multiple on", 
I can set breakpoints ok.
Comment 7 Tom Tromey 2017-08-19 18:09:49 UTC
*** Bug 17543 has been marked as a duplicate of this bug. ***
Comment 8 Sourceware Commits 2021-07-01 13:07:38 UTC
The master branch has been updated by Pedro Alves <palves@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;h=05c06f318fd9a112529dfc313e6512b399a645e4

commit 05c06f318fd9a112529dfc313e6512b399a645e4
Author: Pedro Alves <pedro@palves.net>
Date:   Fri Jun 11 17:56:32 2021 +0100

    Linux: Access memory even if threads are running
    
    Currently, on GNU/Linux, if you try to access memory and you have a
    running thread selected, GDB fails the memory accesses, like:
    
     (gdb) c&
     Continuing.
     (gdb) p global_var
     Cannot access memory at address 0x555555558010
    
    Or:
    
     (gdb) b main
     Breakpoint 2 at 0x55555555524d: file access-mem-running.c, line 59.
     Warning:
     Cannot insert breakpoint 2.
     Cannot access memory at address 0x55555555524d
    
    This patch removes this limitation.  It teaches the native Linux
    target to read/write memory even if the target is running.  And it
    does this without temporarily stopping threads.  We now get:
    
     (gdb) c&
     Continuing.
     (gdb) p global_var
     $1 = 123
     (gdb) b main
     Breakpoint 2 at 0x555555555259: file access-mem-running.c, line 62.
    
    (The scenarios above work correctly with current GDBserver, because
    GDBserver temporarily stops all threads in the process whenever GDB
    wants to access memory (see prepare_to_access_memory /
    done_accessing_memory).  Freezing the whole process makes sense when
    we need to be sure that we have a consistent view of memory and don't
    race with the inferior changing it at the same time as GDB is
    accessing it.  But I think that's a too-heavy hammer for the default
    behavior.  I think that ideally, whether to stop all threads or not
    should be policy decided by gdb core, probably best implemented by
    exposing something like gdbserver's prepare_to_access_memory /
    done_accessing_memory to gdb core.)
    
    Currently, if we're accessing (reading/writing) just a few bytes, then
    the Linux native backend does not try accessing memory via
    /proc/<pid>/mem and goes straight to ptrace
    PTRACE_PEEKTEXT/PTRACE_POKETEXT.  However, ptrace always fails when
    the ptracee is running.  So the first step is to prefer
    /proc/<pid>/mem even for small accesses.  Without further changes
    however, that may cause a performance regression, due to constantly
    opening and closing /proc/<pid>/mem for each memory access.  So the
    next step is to keep the /proc/<pid>/mem file open across memory
    accesses.  If we have this, then it doesn't make sense anymore to even
    have the ptrace fallback, so the patch disables it.
    
    I've made it such that GDB only ever has one /proc/<pid>/mem file open
    at any time.  As long as a memory access hits the same inferior
    process as the previous access, then we reuse the previously open
    file.  If however, we access memory of a different process, then we
    close the previous file and open a new one for the new process.
    
    If we wanted, we could keep one /proc/<pid>/mem file open per
    inferior, and never close them (unless the inferior exits or execs).
    However, having seen bfd patches recently about hitting too many open
    file descriptors, I kept the logic to have only one file open tops.
    Also, we need to handle memory accesses for processes for which we
    don't have an inferior object, for when we need to detach a
    fork-child, and we'd probaly want to handle caching the open file for
    that scenario (no inferior for process) too, which would probably end
    up meaning caching for last non-inferior process, which is very much
    what I'm proposing anyhow.  So always having one file open likely ends
    up a smaller patch.
    
    The next step is handling the case of GDB reading/writing memory
    through a thread that is running and exits.  The access should not
    result in a user-visible failure if the inferior/process is still
    alive.
    
    Once we manage to open a /proc/<lwpid>/mem file, then that file is
    usable for memory accesses even if the corresponding lwp exits and is
    reaped.  I double checked that trying to open the same
    /proc/<lwpid>/mem path again fails because the lwp is really gone so
    there's no /proc/<lwpid>/ entry on the filesystem anymore, but the
    previously open file remains usable.  It's only when the whole process
    execs that we need to reopen a new file.
    
    When the kernel destroys the whole address space, i.e., when the
    process exits or execs, the reads/writes fail with 0 aka EOF, in which
    case there's nothing else to do than returning a memory access
    failure.  Note this means that when we get an exec event, we need to
    reopen the file, to access the process's new address space.
    
    If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
    we're opening it for exits before we open it and before we reap the
    LWP (i.e., the LWP is zombie), the open fails with EACCES.  The patch
    handles this by just looking for another thread until it finds one
    that we can open a /proc/<pid>/mem successfully for.
    
    If we need to open (or reopen) the /proc/<pid>/mem file, and the LWP
    we're opening has exited and we already reaped it, which is the case
    if the selected thread is in THREAD_EXIT state, the open fails with
    ENOENT.  The patch handles this the same way as a zombie race
    (EACCES), instead of checking upfront whether we're accessing a
    known-exited thread, because that would result in more complicated
    code, because we also need to handle accessing lwps that are not
    listed in the core thread list, and it's the core thread list that
    records the THREAD_EXIT state.
    
    The patch includes two testcases:
    
    #1 - gdb.base/access-mem-running.exp
    
      This is the conceptually simplest - it is single-threaded, and has
      GDB read and write memory while the program is running.  It also
      tests setting a breakpoint while the program is running, and checks
      that the breakpoint is hit immediately.
    
    #2 - gdb.threads/access-mem-running-thread-exit.exp
    
      This one is more elaborate, as it continuously spawns short-lived
      threads in order to exercise accessing memory just while threads are
      exiting.  It also spawns two different processes and alternates
      accessing memory between the two processes to exercise the reopening
      the /proc file frequently.  This also ends up exercising GDB reading
      from an exited thread frequently.  I confirmed by putting abort()
      calls in the EACCES/ENOENT paths added by the patch that we do hit
      all of them frequently with the testcase.  It also exits the
      process's main thread (i.e., the main thread becomes zombie), to
      make sure accessing memory in such a corner-case scenario works now
      and in the future.
    
    The tests fail on GNU/Linux native before the code changes, and pass
    after.  They pass against current GDBserver, again because GDBserver
    supports memory access even if all threads are running, by
    transparently pausing the whole process.
    
    gdb/ChangeLog:
    yyyy-mm-dd  Pedro Alves  <pedro@palves.net>
    
            PR mi/15729
            PR gdb/13463
            * linux-nat.c (linux_nat_target::detach): Close the
            /proc/<pid>/mem file if it was open for this process.
            (linux_handle_extended_wait) <PTRACE_EVENT_EXEC>: Close the
            /proc/<pid>/mem file if it was open for this process.
            (linux_nat_target::mourn_inferior): Close the /proc/<pid>/mem file
            if it was open for this process.
            (linux_nat_target::xfer_partial): Adjust.  Do not fall back to
            inf_ptrace_target::xfer_partial for memory accesses.
            (last_proc_mem_file): New.
            (maybe_close_proc_mem_file): New.
            (linux_proc_xfer_memory_partial_pid): New, with bits factored out
            from linux_proc_xfer_partial.
            (linux_proc_xfer_partial): Delete.
            (linux_proc_xfer_memory_partial): New.
    
    gdb/testsuite/ChangeLog
    yyyy-mm-dd  Pedro Alves  <pedro@palves.net>
    
            PR mi/15729
            PR gdb/13463
            * gdb.base/access-mem-running.c: New.
            * gdb.base/access-mem-running.exp: New.
            * gdb.threads/access-mem-running-thread-exit.c: New.
            * gdb.threads/access-mem-running-thread-exit.exp: New.
    
    Change-Id: Ib3c082528872662a3fc0ca9b31c34d4876c874c9
Comment 9 Pedro Alves 2021-07-01 13:10:06 UTC
I believe this is fixed now.