This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
Rich Felker <dalias@libc.org> writes:
> On Sun, Jul 21, 2019 at 12:03:10AM -0400, Rich Felker wrote:
>> On Tue, Jul 16, 2019 at 05:08:48PM -0700, Alistair Francis wrote:
>> > If the waitid syscall is avaliable let's use that as waitpid
>> > and wait4 aren't always avaliable (they aren't avaliable on RV32).
>> >
>> > Unfortunately waitid is substantially differnt to waitpid and wait4, so
>> > the conversion ends up being complex.
>> >
>> > Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
>> > ---
>> > ChangeLog | 3 ++
>> > sysdeps/unix/sysv/linux/wait.c | 39 ++++++++++++++++--
>> > sysdeps/unix/sysv/linux/waitpid.c | 46 ++++++++++++++++++++++
>> > sysdeps/unix/sysv/linux/waitpid_nocancel.c | 45 +++++++++++++++++++++
>> > 4 files changed, 130 insertions(+), 3 deletions(-)
>> > [...]
>> >
>> > weak_alias (__libc_wait, __wait)
>> > diff --git a/sysdeps/unix/sysv/linux/waitpid.c b/sysdeps/unix/sysv/linux/waitpid.c
>> > index f0897574c0..7d4e0bb77d 100644
>> > --- a/sysdeps/unix/sysv/linux/waitpid.c
>> > +++ b/sysdeps/unix/sysv/linux/waitpid.c
>> > @@ -20,12 +20,58 @@
>> > #include <sysdep-cancel.h>
>> > #include <stdlib.h>
>> > #include <sys/wait.h>
>> > +#include <unistd.h>
>> >
>> > __pid_t
>> > __waitpid (__pid_t pid, int *stat_loc, int options)
>> > {
>> > #ifdef __NR_waitpid
>> > return SYSCALL_CANCEL (waitpid, pid, stat_loc, options);
>> > +#elif defined(__NR_waitid)
>> > + __pid_t ret;
>> > + idtype_t idtype = P_PID;
>> > + siginfo_t infop;
>> > +
>> > + if (pid < -1) {
>> > + idtype = P_PGID;
>> > + pid *= -1;
>> > + } else if (pid == -1) {
>> > + idtype = P_ALL;
>> > + } else if (pid == 0) {
>> > + idtype = P_PGID;
>> > + pid = getpgrp();
>> > + }
>> > +
>> > + options |= WEXITED;
>> > +
>> > + ret = SYSCALL_CANCEL (waitid, idtype, pid, &infop, options, NULL);
>>
>> This emulation has a fundamental race condition. Between getpgrp and
>> waitid, a signal handler may perform setpgrp, setsid, and/or fork in
>> ways that cause the wrong pgid to be passed to the waitid syscall.
>> There is no way around this because you cannot block signals for the
>> interval, since signals must be able to interrupt the waitid syscall.
>>
>> Unless there's some trick I'm missing here, the kernel folks' removal
>> of the wait4 syscall is just a bug in the kernel that they need to
>> fix. It also makes it impossible to implement the wait4 function,
>> since there's no way to get rusage for the exited process.
>
> Reportedly (via Stefan O'Rear just now) there was a similar kernel bug
> introduced in 161550d74c07303ffa6187ba776f62df5a906a21 that makes
> wait4 fail to honor pgrp changes that happen while already in the
> syscall (e.g. performed on the caller by another thread or even
> another process).
I could not find the report from Stefan O'Rear.
Does that result in actual problems for programs or is this a
theoretical race noticed upon code review?
> But the race condition here in userspace is even
> more egregious I think, since it violates the contract in a case where
> there is a clear observable order between the pgrp change and the
> blocking wait -- for instance, the signal handler could change pgrp
> of itself and a child process, and then whether or not the signal
> handler had executed before the waitpid, the waitpid should catch the
> child's exit. But with the above race, it fails to.
Definitely a bigger issue. I believe posix requires waitpid is required
to be signal safe.
Eric