This is the mail archive of the
gdb-patches@sourceware.org
mailing list for the GDB project.
Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.
- From: Simon Marchi <simon dot marchi at ericsson dot com>
- To: Philippe Waroquiers <philippe dot waroquiers at skynet dot be>, "gdb-patches at sourceware dot org" <gdb-patches at sourceware dot org>, Pedro Alves <palves at redhat dot com>
- Date: Fri, 9 Nov 2018 21:44:19 +0000
- Subject: Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.
- References: <20181103201929.10345-1-philippe.waroquiers@skynet.be>
On 2018-11-03 4:19 p.m., Philippe Waroquiers wrote:
> Running the gdb testsuite under Valgrind started to fail after 100+ tests,
> due to out of memory caused by lingering processes.
>
> The lingering processes are caused by the combination
> of a limitation in Valgrind signal handling when using PTRACE_TRACEME
> and a (minor) bug in GDB.
>
> The Valgrind limitation is : when a process is ptraced and raises
> a signal, Valgrind will replace the raised signal by SIGSTOP as other
> signals are masked by Valgrind when executing a system call.
> Removing this limitation seems far to be trivial, valgrind signal
> handling is very complex.
>
> Due to this valgrind limitation, GDB linux_ptrace_test_ret_to_nx gets
> a SIGSTOP signal instead of the expected SIGTRAP or SIGSEGV.
> In such a case, linux_ptrace_test_ret_to_nx does an early return, but
> does not kill the child (running under valgrind), child stays in a STOP-ped
> state.
> These lingering processes then eat the available system memory,
> till launching a new process starts to fail.
>
> This patch fixes the GDB minor bug by killing the child in case
> linux_ptrace_test_ret_to_nx does an early return.
>
> nat/linux-ptrace.c has 3 different logics to kill a child process.
> So, this patch factorizes killing a child in the function kill_child.
>
> The 3 different logics are:
> * linux_ptrace_test_ret_to_nx is calling both kill (child, SIGKILL)
> and ptrace (PTRACE_KILL, child, ...), and then is calling once
> waitpid.
> * linux_check_ptrace_features is calling ptrace (PTRACE_KILL, child, ...)
> + my_waitpid in a loop, as long as the waitpid status was WIFSTOPPED.
> * linux_test_for_tracefork is calling once ptrace (PTRACE_KILL, child, ...)
> + my_waitpid.
>
> The linux ptrace documentation indicates that PTRACE_KILL is deprecated,
> and tells to not use it, as it might return success but not kill the tracee.
> The documentation indicates to send SIGKILL directly.
>
> I suspect that linux_ptrace_test_ret_to_nx calls both kill and ptrace just
> to be sure ...
> I suspect that linux_check_ptrace_features calls ptrace in a loop
> to bypass the PTRACE_KILL limitation.
> And it looks like linux_test_for_tracefork does not handle the PTRACE_KILL
> limitation.
> Also, 2 of the 3 logics are calling my_waitpid, which seems better,
> as this is protecting the waitpid syscall against EINTR.
>
> So, the logic in kill_child is just using kill (child, SIGKILL)
> + my_waitpid, and then does a few verifications to see everything worked
> accordingly to the plan.
>
> Tested on Debian/x86_64.
>
> 2018-11-03 Philippe Waroquiers <philippe.waroquiers@skynet.be>
>
> * nat/linux-ptrace.c (kill_child): New function.
> (linux_ptrace_test_ret_to_nx): Use kill_child instead of local code.
> Add a call to kill_child in case of early return after fork.
> (linux_check_ptrace_features): Use kill_child instead of local code.
> (linux_test_for_tracefork): Likewise.
This makes sense to me, but I'd like to invoke pedro(), see if he sees
anything wrong with it.
Simon