This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- From: Arnd Bergmann <arnd at arndb dot de>
- To: Rich Felker <dalias at libc dot org>
- Cc: Alistair Francis <alistair dot francis at wdc dot com>, GNU C Library <libc-alpha at sourceware dot org>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Florian Weimer <fweimer at redhat dot com>, Palmer Dabbelt <palmer at sifive dot com>, macro at wdc dot com, Zong Li <zongbox at gmail dot com>, Alistair Francis <alistair23 at gmail dot com>, "Eric W. Biederman" <ebiederm at xmission dot com>, Andrew Morton <akpm at linux-foundation dot org>, Linus Torvalds <torvalds at linux-foundation dot org>, Al Viro <viro at zeniv dot linux dot org dot uk>, Christian Brauner <christian at brauner dot io>, "H. Peter Anvin" <hpa at zytor dot com>
- Date: Sun, 21 Jul 2019 09:57:03 +0200
- Subject: Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- References: <cover.1563321715.git.alistair.francis@wdc.com> <2d3359a195633b85e01f83bf536330d72f7bc8aa.1563321715.git.alistair.francis@wdc.com> <20190721040310.GA3283@brightrain.aerifal.cx>
On Sun, Jul 21, 2019 at 6:03 AM Rich Felker <dalias@libc.org> wrote:
> On Tue, Jul 16, 2019 at 05:08:48PM -0700, Alistair Francis wrote:
> > #ifdef __NR_waitpid
> > return SYSCALL_CANCEL (waitpid, pid, stat_loc, options);
> > +#elif defined(__NR_waitid)
> > + __pid_t ret;
> > + idtype_t idtype = P_PID;
> > + siginfo_t infop;
> > +
> > + if (pid < -1) {
> > + idtype = P_PGID;
> > + pid *= -1;
> > + } else if (pid == -1) {
> > + idtype = P_ALL;
> > + } else if (pid == 0) {
> > + idtype = P_PGID;
> > + pid = getpgrp();
> > + }
> > +
> > + options |= WEXITED;
> > +
> > + ret = SYSCALL_CANCEL (waitid, idtype, pid, &infop, options, NULL);
>
> This emulation has a fundamental race condition. Between getpgrp and
> waitid, a signal handler may perform setpgrp, setsid, and/or fork in
> ways that cause the wrong pgid to be passed to the waitid syscall.
> There is no way around this because you cannot block signals for the
> interval, since signals must be able to interrupt the waitid syscall.
Interesting, I don't think anyone ever considered this race, as waitid()
was clearly always /intended/ as a superset of wait(), waitpid(), wait3()
and wait4(), as Roland McGrath described in the initial creation
of the waitid() syscall, see
https://lore.kernel.org/lkml/200408152303.i7FN3Vc3021030@magilla.sf.frob.com/
I originally planned to have a new waitid_time64() syscall to replace
them in the y2038 series, but that got deferred, so at the moment
the older 32-bit architectures don't have any changes for wait*(), and
riscv32 just removed wait4() but kept waitid().
It would be easy enough to restore wait4() for riscv32, or we could
decide to extend waitid() in a backward-compatible way to close
this race, e.g. by allowing P_PGID with pid=0 (this is currently
-EINVAL), by adding a new P_SAME_PGID macro to cover that
case, or by creating a new waitid replacement that does other things
as well (nanosecond ru_*time, pidfd, ...).
Adding a few more kernel developers that may have an opinion on this.
> Unless there's some trick I'm missing here, the kernel folks' removal
> of the wait4 syscall is just a bug in the kernel that they need to
> fix. It also makes it impossible to implement the wait4 function,
> since there's no way to get rusage for the exited process.
The rusage is passed back from the kernel waitid() as an extension
to the posix interface, in order to allow implementing wait4().
Arnd