PING^2 Re: [RFA] Factorize killing the children in linux-ptrace.c, and fix a 'process leak'.

Philippe Waroquiers philippe.waroquiers@skynet.be
Mon Dec 10 20:46:00 GMT 2018


Any more feedback ?
Thanks
Philippe

On Mon, 2018-11-26 at 00:13 +0100, Philippe Waroquiers wrote:
> Ping ?
> The patch discussed here is already a first improvement
> (which makes running tests under valgrind more comfortable).
> Some more factorisation might be done with the 4 other
> places where children are killed, but these seems more risky.
> 
> So, maybe this patch could be pushed as a first low risk
> step towards the good direction ?
> 
> Thanks
> 
> Philippe
> 
> 
> On Sun, 2018-11-11 at 18:24 +0100, Philippe Waroquiers wrote:
> > On Sat, 2018-11-10 at 18:29 +0000, Pedro Alves wrote:
> > > There's some long and ugly history around PTRACE_KILL vs SIGKILL.
> > > See gdb/linux-nat.c kill_wait_one_lwp and kill_one_lwp, and the comments
> > > within those functions.  I don't know whether the kernels those were for
> > > are relevant any more.  Likely using git blame and finding the corresponding
> > > patch posts would shed some more light.  Whatever we do, it'd be nice to
> > > make gdb/linux-nat.c use the unified code as well.
> > 
> > I have done some digging in the source code and searched some
> > more info using git blame.
> > => I found 7 different linux logics related to kill(SIGKILL)
> > and ptrace(PTRACE_KILL).
> > I found only one identification of kernels giving problems.
> > ChangeLogs/commit msg were not really giving much
> > background information : these are telling what is being changed,
> > not why it is being changed.
> > linux-low.c has the most interesting comments explaining
> > the kill related problems.
> > 
> > Below is some updated text with the found info (sorry, you
> > will have to re-read some parts already in the RFA).
> > 
> > I have to admit that a 'linux-ptrace.c only factorization'
> > as suggested in the RFA (at least as a first step, till we have
> > validated with it that a simpler kill logic is ok on various platforms)
> > is somewhat less frightening me than reducing all 7 different
> > logics to a single one.
> > (there is a table at the end of this mail that summarises all
> > 7 logics).
> > 
> > I must say that all this is killing me :).
> > Maybe I would feel better if I would just have added my own
> > local new 8th kill logic where I need it :).
> > 
> > Philippe
> > 
> > 
> > nat/linux-ptrace.c has 3 different logics to kill a child process.
> > So, this patch factorizes killing a child in the function kill_child.
> > 
> > The 3 different logics are:
> > * linux_ptrace_test_ret_to_nx is calling both kill (child, SIGKILL)
> >   and ptrace (PTRACE_KILL, child, ...), and then is calling once
> >   waitpid.
> > * linux_check_ptrace_features is calling ptrace (PTRACE_KILL, child, ...)
> >   + my_waitpid in a loop, as long as the waitpid status was WIFSTOPPED.
> > * linux_test_for_tracefork is calling once ptrace (PTRACE_KILL, child, ...)
> >   + my_waitpid.
> > 
> > The linux ptrace documentation indicates that PTRACE_KILL is deprecated,
> > and tells to not use it, as it might return success but not kill the tracee.
> > The documentation indicates to send SIGKILL directly.
> > 
> > I suspect that linux_ptrace_test_ret_to_nx calls both kill and ptrace just
> > to be sure ...
> > git blame indicates that the call to kill(SIGKILL) was added in addition to the
> > pre-existing ptrace PTRACE_KILL. The commit log also indicates to ignore
> > PTRACE_KILL errors. But there is no explanation why adding the SIGKILL call
> > was needed, and why PTRACE_KILL was kept.
> > 
> > I suspect that linux_check_ptrace_features calls ptrace in a loop
> > to bypass the PTRACE_KILL limitation.
> > git blame indicates that this function was heavily based on
> > linux-nat.c:linux_test_for_tracefork (which was moved to linux-ptrace.c,
> > see next paragraph). I found no log/comments about the killing subtilities.
> > 
> > linux_test_for_tracefork seems to not handle the PTRACE_KILL
> > limitation/has not been updated like linux_check_ptrace_features
> > (which has a loop).
> > 
> > Also, 2 of the 3 logics are calling my_waitpid, which seems better,
> > as this is protecting the waitpid syscall against EINTR.
> > 
> > We also find a bunch of other kill logics in gdbserver/linux-low.c and
> > linux-nat.c and linux-fork.c.
> > 
> > gdbserver/linux-low.c has linux_kill_one_lwp, that has an extensive
> > comment explaining why it uses both PTRACE_KILLME and SIGKILL
> > Note however that the gdbserver code does the SIGKILL using
> > 'syscall (__NR_tkill', not using 'kill (... SIGKILL).
> > gdbserver/linux-low.c does a loop calling both PTRACE_KILLME and SIGKILL,
> > but has a comment telling that the loop is most likely unnecessary.
> > 
> > Also, gdbservers/linux-low.c handles the case of waiting for "clone" children,
> > indicating that __WALL depends on SIGCHLD, and killing a stopped process
> > does not generate this signal.
> > It also indicates it supports old kernels not providing clone(CLONE_THREAD),
> > by sending a SIGKILL for each thread (not only one for the process).
> > There is also a comment telling that only PTRACE_KILL was used for years,
> > and that paranoia tells to do both SIGKILL and PTRACE_KILL, in case
> > PTRACE_KILL might work better on old kernels (the comment says it is dubious
> > there is such an old kernel, but explains that this is the paranoia).
> > linux-low.c also has a special logic to kill the thread that has its
> > lwpid == pid, because killing the parent before the children can
> > cause zombies on kernels at least 2.6.0-test7 ... 2.6.8-rc4.
> > 
> > linux-nat.c first uses SIGKILL, and then PTRACE_KILL. A comment (from 2011)
> > is telling that PTRACE_KILL may resume the inferior (so the need for SIGKILL).
> > Another comment (also 2011) tells that some kernels ignore even SIGKILL
> > for processes under ptrace (so the reason for keeping the PTRACE_KILL).
> > Note that linux-nat.c also loops (like linux-low.c), but indicates
> > the loop is needed as the linux kernel sometimes fails to kill a thread
> > after PTRACE_KILL.
> > To the contrary of linux-low.c, linux-nat.c uses waitpid (__WALL) rather
> > than do 2 successive waitpid calls like linux-low.c.
> > linux-nat.c also stops all threads before killing them, with a comment
> > explaining that this is needed to have PTRACE_KILL working.
> > 
> > linux-fork.c has also its own duplicated kill logics.
> > One uses (only) SIGKILL (with a comment telling this always works,
> > contrary to PTRACE_KILL).
> > The other logic (related to killing checkpoints) only uses PTRACE_KILL.
> > This logic does not call waitpid, instead it does an inferior call
> > to have waitpid called by an inferior (no idea why, must be something
> > special related to checkpoint implementation).
> > 
> > Below is a table summarising what has been found.
> > loop&stopcond is '-' if the logic at 'where' does not loop, otherwise
> > it details when the loop continues. SIGKILL and PTRACE_KILL column contains
> > a YES if the logic uses them, otherwise '-' indicates
> > absence of call. A 'V' indicates the return status is verified,
> > otherwise errors are ignored. TK indicates SIGKILL is sent with __NR_tkill,
> > not with the usual kill(SIGKILL).
> > 
> > waitpid gives some details about how the waitpid is done:
> >   'MY' indicates my_waitpid is called (to protect against EINTR),
> >   otherwise, it is a direct call to waitpid.
> >   Note that the return status of waitpid is not systematically checked,
> >   or is not checked the same way or errors are not necessarily reported.
> >   Flags are 0, or the specifically used flags. 2 set of flags separated
> >   by a , means there are 2 calls to waitpid.
> >   
> > 
> > where                        loop&cond        SIGKILL PTRACE_KILL waitpid
> > --------------------         -------------    ------- ----------- -------
> > linux-ptrace.c
> >  linux_ptrace_test_ret_to_nx -                YES     YES         0
> >  linux_check_ptrace_features WIFSTOPPED       -       YES V       MY 0
> >  linux_test_for_tracefork    -                -       YES V       MY 0
> >  
> > linux-low.c                  WIFSTOPPED       YES TK  YES         MY 0,__WCLONE
> > 
> > linux-nat.c                  waitpid == pid   YES TK  YES         MY __WALL
> > 
> > linux-fork.c
> >  linux_fork_killall          WIFSTOPPED       YES     -           0
> >  delete_checkpoint_command   -                -       YES V       (infcall) 0
> > 



More information about the Gdb-patches mailing list