This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- From: ebiederm at xmission dot com (Eric W. Biederman)
- To: Arnd Bergmann <arnd at arndb dot de>
- Cc: Rich Felker <dalias at libc dot org>, Alistair Francis <alistair dot francis at wdc dot com>, GNU C Library <libc-alpha at sourceware dot org>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Florian Weimer <fweimer at redhat dot com>, Palmer Dabbelt <palmer at sifive dot com>, macro at wdc dot com, Zong Li <zongbox at gmail dot com>, Alistair Francis <alistair23 at gmail dot com>, Andrew Morton <akpm at linux-foundation dot org>, Linus Torvalds <torvalds at linux-foundation dot org>, Al Viro <viro at zeniv dot linux dot org dot uk>, Christian Brauner <christian at brauner dot io>, "H. Peter Anvin" <hpa at zytor dot com>
- Date: Sun, 21 Jul 2019 10:45:13 -0500
- Subject: Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- References: <cover.1563321715.git.alistair.francis@wdc.com> <2d3359a195633b85e01f83bf536330d72f7bc8aa.1563321715.git.alistair.francis@wdc.com> <20190721040310.GA3283@brightrain.aerifal.cx> <CAK8P3a0bhS_d1SMyLQO_LDNvdgR+MKXmoESxWkrcz22mvjKcuw@mail.gmail.com> <87o91nbn37.fsf@xmission.com> <CAK8P3a3wgavtarKxSYJGL0ME9KRZ8UsUAZw+Y5J8WpG1GQ+=mw@mail.gmail.com>
Arnd Bergmann <arnd@arndb.de> writes:
> On Sun, Jul 21, 2019 at 2:15 PM Eric W. Biederman <ebiederm@xmission.com> wrote:
>> Arnd Bergmann <arnd@arndb.de> writes:
>> > On Sun, Jul 21, 2019 at 6:03 AM Rich Felker <dalias@libc.org> wrote:
>> >> On Tue, Jul 16, 2019 at 05:08:48PM -0700, Alistair Francis wrote:
>> >> > #ifdef __NR_waitpid
>> >> > return SYSCALL_CANCEL (waitpid, pid, stat_loc, options);
>> >> > +#elif defined(__NR_waitid)
>> >> > + __pid_t ret;
>> >> > + idtype_t idtype = P_PID;
>> >> > + siginfo_t infop;
>> >> > +
>> >> > + if (pid < -1) {
>> >> > + idtype = P_PGID;
>> >> > + pid *= -1;
>> >> > + } else if (pid == -1) {
>> >> > + idtype = P_ALL;
>> >> > + } else if (pid == 0) {
>> >> > + idtype = P_PGID;
>> >> > + pid = getpgrp();
>> >> > + }
>> >> > +
>> >> > + options |= WEXITED;
>> >> > +
>> >> > + ret = SYSCALL_CANCEL (waitid, idtype, pid, &infop, options, NULL);
>> >>
>> >> This emulation has a fundamental race condition. Between getpgrp and
>> >> waitid, a signal handler may perform setpgrp, setsid, and/or fork in
>> >> ways that cause the wrong pgid to be passed to the waitid syscall.
>> >> There is no way around this because you cannot block signals for the
>> >> interval, since signals must be able to interrupt the waitid syscall.
>> >
>> > Interesting, I don't think anyone ever considered this race, as waitid()
>> > was clearly always /intended/ as a superset of wait(), waitpid(), wait3()
>> > and wait4(), as Roland McGrath described in the initial creation
>> > of the waitid() syscall, see
>> > https://lore.kernel.org/lkml/200408152303.i7FN3Vc3021030@magilla.sf.frob.com/
>> >
>>
>> It is definitley a problem as setpid and setpgrp and setsid are all
>> required to be signal safe, and this waitpid emulation is clearly not.
>
> Ok.
>
>> Would adding wait4 to riscv allow us to get rid of the rusage parameter
>> of the kernel's waitid?
>
> No, the kernel waitid() behavior is widely documented, and there are
> likely to be applications that rely on it across architectures.
>
> The only downside I can think of for adding wait4() is that it also
> uses 32-bit time_t in its rusage, so if we create a time64 version
> of rusage, we now have to add three more system calls (waitid,
> getrusage, and wait4) instead of just two.
>
>> I don't oppose a new syscall to take advantage of pidfd but I don't
>> think there is any need to tie the two fixes together so I would rather
>> keep them separate. Just so we don't rush through fixing one to deal
>> with the other.
>
> I see at multiple independent issues with the current waitid():
>
> - The race that Rich pointed out
> - The lack of pidfd support
> - The usage of a 32-bit timeval structure that is incompatible with
> the libc definition on rv32 and other 32-bit architectures with
> a y2038-safe libc (there is no y2038 overflow in rusage itself, but
> it requires an ugly wrapper to convert the structure)
> - The lack of nanosecond resolution in rusage that someone asked for
> - When we last talked about it, there was some debate over replacing
> siginfo_t with a different structure.
>
> We don't have to address each one of those in a single new syscall
> (or even at all), but addressing them one at a time would risk adding
> even more system calls to do essentially just one thing that already
> has at least five versions at the libc level (wait, waitpid, wait3, wait4
> and waitid).
For the particular issue of 32bit riscv needing a working wait4 I see
one of two possibilities.
We either add P_PROCESS_PGID to the kernel's waitid or we add wait4.
I am leaning towards P_PROCESS_PGID as this is the tiny little bit
needed to make the kernel's waitid the one call that combines them all,
that it already tries to be. Plus P_PROCESS_PGID or the equivalent in
the kernel internal version is needed if we choose to have wait4 handle
the process group id changing while wait4 is running.
Eric