This is the mail archive of the
mailing list for the glibc project.
Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- From: ebiederm at xmission dot com (Eric W. Biederman)
- To: Rich Felker <dalias at libc dot org>
- Cc: Linus Torvalds <torvalds at linux-foundation dot org>, Arnd Bergmann <arnd at arndb dot de>, Alistair Francis <alistair dot francis at wdc dot com>, GNU C Library <libc-alpha at sourceware dot org>, Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, Florian Weimer <fweimer at redhat dot com>, Palmer Dabbelt <palmer at sifive dot com>, macro at wdc dot com, Zong Li <zongbox at gmail dot com>, Alistair Francis <alistair23 at gmail dot com>, Andrew Morton <akpm at linux-foundation dot org>, Al Viro <viro at zeniv dot linux dot org dot uk>, Christian Brauner <christian at brauner dot io>, "H. Peter Anvin" <hpa at zytor dot com>
- Date: Mon, 22 Jul 2019 19:00:50 -0500
- Subject: Re: [RFC v3 03/23] sysdeps/wait: Use waitid if avaliable
- References: <firstname.lastname@example.org> <email@example.com> <20190721040310.GA3283@brightrain.aerifal.cx> <CAK8P3a0bhS_d1SMyLQO_LDNvdgR+MKXmoESxWkrcz22mvjKcuw@mail.gmail.com> <firstname.lastname@example.org> <CAK8P3a3wgavtarKxSYJGL0ME9KRZ8UsUAZw+Y5J8WpG1GQemail@example.com> <firstname.lastname@example.org> <CAHk-=wjtGaJFceL+YB6=mTxQCvyNvBTavqgGTm-d5FA9xLQ0Cw@mail.gmail.com> <email@example.com> <20190721232336.GA30851@brightrain.aerifal.cx>
Rich Felker <firstname.lastname@example.org> writes:
> On Sun, Jul 21, 2019 at 04:40:27PM -0500, Eric W. Biederman wrote:
>> Linus Torvalds <email@example.com> writes:
>> > On Sun, Jul 21, 2019 at 8:45 AM Eric W. Biederman <firstname.lastname@example.org> wrote:
>> >> We either add P_PROCESS_PGID to the kernel's waitid or we add wait4.
>> > Do we actually need a new flag?
>> > Why not just allow people to pass in pid = 0 with the existing P_PGID flag?
>> > Those are the traditional and documented waitpid() semantics. Why
>> > wouldn't we use those semantics for waitid() too?
>> > And since our current (broken) waitid() returns EINVAL for that case,
>> > it's even trivial to test for in user space (and fall back on the
>> > broken racy code in user space - which you have to do anyway).
>> Our current waitid implementation when serving as waitid is not broken.
>> What is broken is the waitpid emulation on top of waitid.
>> The way waitid is defined waitid P_PGID with a pid of 0 means
>> wait for the process group id of 0. Which in theory and in limited
>> practice exists. AKA that is the pid and process group of the idle
>> Further there is the outstanding question. Should P_PROCESS_PGID refer
>> to the curent processes process group id at the time of the waitid call,
>> or should P_PROCESS_PGID refer to the process group id when a child is
>> It has been suggested elswehere in this conversation that 161550d74c07
>> ("pid: sys_wait... fixes") may have introduced a regression. If so
>> the current wait4 behavior of capturing the process group at the time
>> of call is wrong and needs to be fixed.
>> Not capuring the pid at time time of call is a very different behavior
>> of P_PROCESS_PGID vs all of the other waitid cases which do have the pid
>> fixed at the time of the call which further argues a separate flag
>> because the behavior would be distinctly different.
> I believe that is a regression, and that the "capturing it" line of
> thinking is misleading. It's thinking in terms of implementation
> rather than interface contract. The interface contract covers which
> children it can reap and when it can block. If the call remains
> blocking, there must be a possible ordering of events, with no
> observable effects contradicting that order, by which there are no
> reapable children in the process's current pgid. If the call returns
> successfully, there must be a possible ordering of events, with no
> observable effects contradicting that order, by which the caller's
> pgid and the reaped process's pgid were equal.
A regression (as we talk about it in the linux kernel) requires there to
be an actual program that has problems because of the change in behavior
of the kernel. Just it being possible for userspace to observe a
behavior change is not enough for something to be classified as a
Furthermore thinking about this and looking at it more closely this can
only be an issue in the presence of pthreads. Except in a very narrow
special case setpgid is restricted to only changing the process group of
the calling process. Which means that in unix without pthreads setpgid
can not run while a process is in wait, waitpid, or wait4.
I was really hoping we would have a pointer to a program that shows
a problem, or a clarification that someone was just reading the code.
I will wait a bit more but given that the code has worked without bug
reports for 11 years I am going to assume that there are no programs
that have regressed because of my old change.
>> > Honestly, that seems like the simplest soilution, but also like the
>> > only sane model. The fact that P_PGID with a zero pid doesn't work
>> > seems to simply be a bug right now, and keeps waitid() from being a
>> > proper superset of waitpid().
>> In the context of waitid it does not look sane. Unlike the other wait
>> functions waitid does not have any magic ids and by having an idtype
>> field makes that kind of magic unnecessary. Further it is trivial
>> to add one line to the architecture independent enumeration and have 0
>> risk of confusion or of regressing user space.
> I'm in agreement that if an extension to the waitid syscall is added,
> it should be P_PROCESS_PGID, not defining some special case for
> pid==0. It's not entirely clear but arguable that the standard
> requires EINVAL for P_PGID + pid==0, and plausible that there are
> applications that depend on this. We could emulate the EINVAL case in
> userspace, but assigning weird semantics to special cases is just a
> mess of potential future headaches when it would be trivial to do it
> right. And doing it right would also make programming userspace side
Since this is a POSIX interface and shared between many unices I took
at look at a couple to see what they have done. If possible it is
important to be able to write programs that work the same across
The BSDs implemented wait6 as their common wait that does everything
system call. It's large contribution is that wait6 will give both
the rusage of the child that exited and it's children separately.
To be the one system call that is a superset of them all wait6 implements
both P_PGID and P_PID with an id of 0 as a request to wait for any
process in the current process group.
The wait6 family also extends the idtype enumeration a bit with
things such as P_UID and P_GID that wait for the effective uid and gid
I looked at the code on FreeBSD and they replace the id of 0 with
an the result of getpgid() internal to the system call, before any
waiting happens. Resulting in the same semantics as current Linux.
I took a quick look at what Illumos does and they have narrowed their
kernel wait interface to just the posix waitid. There is no rusage
nor is their any special case for waiting for the processes in the
current process group.
I looked at XNU and it still implements both wait4 and waitid as native
interfaces. The waitid is not extended while their wait4 converts 0 to
-getpgrp() before waiting.
So my recommendation now is to avoid gratuitous incompatibilities.
1) For extending waitid. Let's use P_PGID and and id of 0 to represent
the callers process group. That is compatible with the BSDs, and
portable programs should not have any problem with it.
I want to stay away from the BSD extention of P_PID with an id of 0
meaning wait for the calling process's process group. I see where
it comes from but that is confusing.
2) It appears the popular definition of current process group is the
current process group at the time of the system call. Currently that
is Linux, FreeBSD, Illumos, and XNU. So short of a real program that
cares, let's adopt that definition in linux. Along with patches I
will see about getting the linux manpage updated to clarify that