[PATCH 6/8] gdb: fix handling of vfork by multi-threaded program (follow-fork-mode=parent, detach-on-fork=on)
Pedro Alves
pedro@palves.net
Thu Mar 31 18:21:15 GMT 2022
On 2022-01-17 16:27, Simon Marchi via Gdb-patches wrote:
> From: Simon Marchi <simon.marchi@efficios.com>
>
> There is a problem with how GDB handles a vfork happening in a
> multi-threaded program. This problem was reported to me by somebody not
> using vfork directly, but using system(3) in a multi-threaded program,
> which may be implemented using vfork.
>
> This patch only deals about the follow-fork-mode=parent,
> detach-on-fork=on case, because it would be too much to chew at once to
> fix the bugs in the other cases as well (I tried).
Thinking through this, one concern I had is that if the vfork child gets stuck for some reason
before execing or exiting (hey we're a debugger, bugs happen!), then pausing all threads makes
it so that you can end up stuck in gdb and can't get back the prompt (if the child ignores
SIGINT, otherwise it just dies), or if you continue in the background, you
can't interrupt the parent threads.
E.g., with your series, and:
diff --git c/gdb/testsuite/gdb.threads/vfork-multi-thread.c w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
index cd3dfcc1e65..6178de91dca 100644
--- c/gdb/testsuite/gdb.threads/vfork-multi-thread.c
+++ w/gdb/testsuite/gdb.threads/vfork-multi-thread.c
@@ -25,6 +25,8 @@ static void *vforker(void *arg)
/* A vfork child is not supposed to mess with the state of the program,
but it is helpful for the purpose of this test. */
release_main = 1;
+ while (1)
+ sleep (1);
_exit(7);
}
@@ -41,7 +43,7 @@ static void should_break_here(void) {}
int main()
{
-
+ signal (SIGINT, SIG_IGN);
pthread_t thread;
int ret = pthread_create(&thread, NULL, vforker, NULL);
assert(ret == 0);
(END)
We get:
(gdb) r
Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d8a700 (LWP 1713229)]
[Detaching after vfork from child process 1713230]
^C^C^C^C^C^C
(hung forever)
Or:
$ gdb -ex "set non-stop on" ...
(gdb) r&
Starting program: /home/pedro/gdb/binutils-gdb/build/gdb/testsuite/outputs/gdb.threads/vfork-multi-thread/vfork-multi-thread
(gdb) [Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7d8a700 (LWP 1345571)]
[Detaching after vfork from child process 1345572]
info threads
Id Target Id Frame
* 1 Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
2 Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
(gdb) interrupt
*nothing*
(gdb) interrupt
*nothing*
(gdb) info threads
Id Target Id Frame
* 1 Thread 0x7ffff7d8b740 (LWP 1345567) "vfork-multi-thr" (running)
2 Thread 0x7ffff7d8a700 (LWP 1345571) "vfork-multi-thr" (running)
(gdb)
And then you can't quit gdb:
(gdb) q
A debugging session is active.
Inferior 1 [process 1345567] will be killed.
Quit anyway? (y or n) y
^C^C^C^C
(hung forever)
I guess the ^C issue will be fixed by my series reworking ctrl-c, leaving
gdb in control of the terminal at all times..
Bah, I guess that this might be a meh/schrug, since if the parent is
single-threaded, then you can't interrupt it either. Though at least GDB
doesn't get you get into the stuck state.
I wonder whether a better solution would be something completely different.
Like, say, make it so that gdb always remains attached to the child until it
execs/exits, in which case we would no longer need to remove breakpoints from the
parent/child, and thus we could let all the parent's threads run as well.
We actually do that for follow-fork parent / detach-on-fork-on and
vfork -- see inferior::pending_detach.
Anyhow, onward. With your currently solution at least the common scenario
works, and we have a new testcase that helps whatever redesign in future.
Despite these concerns, I'm actually happy to get this into the tree. I think
it's still better than what we have today.
> +
> + /* The rest of the function assumes non-stop==off and
> + target-non-stop==off.
> +
> + If a thread is waiting for a vfork-done event, it means breakpoints are out
> + for this inferior (well, program space in fact). We don't want to resume
> + any thread other than the one waiting for vfork done, otherwise these other
> + threads could miss breakpoints. So if a thread in the resumption set is
> + waiting for a vfork-done event, resume only that thread.
> +
> + The resumption set width depends on whether schedule-multiple is on or off.
> +
> + Note that if the target_resume interface was more flexible, we could be
> + smarter here when schedule-multiple is on. For example, imagine 3
> + inferiors with 2 threads each (1.1, 1.2, 2.1, 2.2, 3.1 and 3.2). Threads
> + 2.1 and 3.2 are both waiting for a vfork-done event. Then we could ask the
> + target(s) to resume:
> +
> + - All threads of inferior 1
> + - Thread 2.1
> + - Thread 3.2
> +
> + Since we don't have that flexibility (we can only pass one ptid), just
> + resume the first thread waiting for a vfork-done event we find (e.g. thread
> + 2.1). */
We actually have the flexibility, but we don't use it. I mean, we could use
the commit_resumed mechanism in all-stop as well, which would let us prepare such
a resume with multiple target_resume calls and final commit. For a rainy day.
> + if (sched_multi)
> + {
> + for (inferior *inf : all_non_exited_inferiors ())
> + if (inf->thread_waiting_for_vfork_done != nullptr)
> + return inf->thread_waiting_for_vfork_done->ptid;
> + }
> + else if (current_inferior ()->thread_waiting_for_vfork_done != nullptr)
> + return current_inferior ()->thread_waiting_for_vfork_done->ptid;
> +
> + return user_visible_resume_ptid (user_step);
> }
>
> /* Wrapper for target_resume, that handles infrun-specific
> @@ -3258,6 +3360,19 @@ proceed (CORE_ADDR addr, enum gdb_signal siggnal)
> continue;
> diff --git a/gdb/testsuite/gdb.threads/vfork-multi-thread.c b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> new file mode 100644
> index 000000000000..cd3dfcc1e653
> --- /dev/null
> +++ b/gdb/testsuite/gdb.threads/vfork-multi-thread.c
> @@ -0,0 +1,74 @@
> +#include <assert.h>
Missing copyright header.
> +#include <pthread.h>
> +#include <stdio.h>
> +#include <unistd.h>
> +#include <sys/wait.h>
> +
> +// Start with:
> +//
> +// ./gdb -nx -q --data-directory=data-directory repro -ex "tb break_here_first" -ex r -ex "b should_break_here"
> +//
> +// Then do "continue".
> +//
> +// The main thread will likely cross should_break_here while we are handling
> +// the vfork and breakpoints are removed, therefore missing the breakpoint.
The whole file needs a pass to adjust it to GNU conventions/formatting.
> +
> +static volatile int release_vfork = 0;
> +static volatile int release_main = 0;
> +
> +static void *vforker(void *arg)
> +{
> + while (!release_vfork);
I'd add some usleep(1) here, to avoid hogging the cpu.
> +
> + pid_t pid = vfork ();
> + if (pid == 0) {
> + /* A vfork child is not supposed to mess with the state of the program,
> + but it is helpful for the purpose of this test. */
> + release_main = 1;
> + _exit(7);
> + }
> +
> + int stat;
> + int ret = waitpid (pid, &stat, 0);
> + assert (ret == pid);
> + assert (WIFEXITED (stat));
> + assert (WEXITSTATUS (stat) == 7);
> +
> + return NULL;
> +}
> +
> +static void should_break_here(void) {}
> +
> +int main()
> +{
> +
> + pthread_t thread;
> + int ret = pthread_create(&thread, NULL, vforker, NULL);
> + assert(ret == 0);
> +
> + /* We break here first, while the thread is stuck on `!release_fork`. */
> + release_vfork = 1;
> +
> + /* We set a breakpoint on should_break_here.
> +
> + We then set "release_fork" from the debugger and continue. The main
> + thread hangs on `!release_main` while the non-main thread vforks. During
> + the window of time where the two processes have a shared address space
> + (after vfork, before _exit), GDB removes the breakpoints from the address
> + space. During that window, only the vfork-ing thread (the non-main
> + thread) is frozen by the kernel. The main thread is free to execute. The
> + child process sets `release_main`, releasing the main thread. A buggy GDB
> + would let the main thread execute during that window, leading to the
> + breakpoint on should_break_here being missed. A fixed GDB does not resume
> + the threads of the vforking process other than the vforking thread. When
> + the vfork child exits, the fixed GDB resumes the main thread, after
> + breakpoints are reinserted, so the breakpoint is not missed. */
> +
> + while (!release_main);
Ditto, usleep.
Pedro Alves
More information about the Gdb-patches
mailing list