[PATCH v8 0/7] Add pidfd and cgroupv2 support for process creation
Adhemerval Zanella
adhemerval.zanella@linaro.org
Fri Aug 18 14:06:35 GMT 2023
The glibc 2.36 added wrappers for Linux syscall pidfd_open, pidfd_getfd,
and pidfd_send_signal, and exported the P_PIDFD to use along with
waitid. The pidfd is a race-free interface, however, the pidfd_open is
subject to TOCTOU if the file descriptor is not obtained directly from
the clone or clone3 syscall (there is still a small window between the
clone return and the pidfd_getfd where the process can be reaped and the
process ID reused).
A fully race-free interface with posix_spawn interface is being
discussed by GNOME [1] [2], and Qt already uses it in its QtProcess
implementation [3]. The Qt implementation has some pitfalls:
- It calls clone through the syscall symbol, which does not run the
pthread_atfork handlers even though it intends to use the clone
semantic for fork (by only using CLONE_PIDFD | SIGCHLD).
- It also does not reset any internal state, such as internal IO,
malloc, loader, etc. locks.
- It does not set the TCB tid field nor the robust list, used by the
pthread code.
- It does not optimize process creation by using CLONE_VM and
CLONE_VFORK.
Also, the recent Linux kernel (starting with 5.7) provides a way to
create a new process in a different cgroups version 2 than the default
one (through clone3 CLONE_INTO_CGROUP flag). Providing it through glibc
interfaces makes it usable without the risk of potential breakage by
issuing clone3 syscall directly (check BZ#26371 discussion).
This patch set adds new interfaces that take care of these potential
issues. The new posix_spawn / posix_spawnp extensions:
#define POSIX_SPAWN_SETCGROUP 0x100
int posix_spawnattr_getcgroup_np (const posix_spawnattr_t
restrict *attr, int *cgroup);
int posix_spawnattr_setcgroup_np (posix_spawnattr_t *restrict attr,
int cgroup);
Allow spawning a new process on a different cgroupv2.
The pidfd_spawn and pidfd_spawnp is similar to posix_spawn and
posix_spawnp, but return a process file descriptor instead of a PID.
int pidfd_spawn (int *restrict pidfd,
const char *restrict file,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict],
char *const envp[restrict]);
int pidfd_spawnp (int *restrict pidfd,
const char *restrict path,
const posix_spawn_file_actions_t *restrict facts,
const posix_spawnattr_t *restrict attrp,
char *const argv[restrict_arr],
char *const envp[restrict_arr]);
The implementation makes sure that kernel must support the complete
pidfd interface, meaning that waitid (P_PIDFD) should be supported. It
ensures that a non-racy workaround is required (such as reading procfs
fdinfo pid to use along with old wait interfaces). If the kernel does
not have the required support the interface returns ENOSYS.
A new symbol is used instead of a posix_spawn extension to avoid
possible issues with language bindings that might track the argument
lifetime.
Both symbols reuse the posix_spawn posix_spawn_file_actions_t and
posix_spawnattr_t, to either avoid rehashing the posix_spawn API or add
a new one. It also means that both interfaces support the same
attribute and file actions, and a new flag or file action on posix_spawn
is also added automatically for pidfd_spawn. It includes
POSIX_SPAWN_SETCGROUP.
Along with the spawn interface, a fork-like one is also provided:
typedef union
{
struct
{
__uint64_t fork_np_flags;
int fork_np_pidfd;
int fork_np_cgroup;
int fork_np_exit_signal;
#define fork_np_flags __data.fork_np_flags
#define fork_np_pidfd __data.fork_np_pidfd
#define fork_np_cgroup __data.fork_np_cgroup
#define fork_np_exit_signal __data.fork_np_exit_signal
} __data;
char __size [FORK_NP_ARGS_SIZE_VER0];
} fork_np_args_t;
#define FORK_NP_PIDFD (1ULL << 1)
#define FORK_NP_CGROUP (1ULL << 2)
#define FORK_NP_ASYNCSAFE (1ULL << 3)
#define FORK_NP_EXIT_SIGNAL (1ULL << 4)
pid_t fork_np (fork_np_args_t *args, size_t size)
The SIZE must represent a supported pidfd_fork_args_t type, otherwise,
the function returns EINVAL.
If ARGS has all members set to 0, no file descriptor is returned and
pidfd_fork acts as fork. If PIDFDFORK_PIDFD is set on the flags member,
a new file descriptor is returned on pidfd member and the kernel sets
O_CLOEXEC as default. The pidfd_fork follows the fork/_Fork convention
on returning a positive or negative value to the parent (with a negative
indicating an error) and zero to the child.
If PIDFDFORK_CGROUP is set, the value on the cgroup member is used as
the cgroupv2 to be placed in the new process (by using the
CLONE_INTO_CGROUP clone flag).
If PIDFDFORK_ASYNCSAFE is set, pidfd_fork acts as _Fork, thus avoiding
running pthread_atfork handlers.
If PIDFDFORK_EXIT_SIGNAL is set, the signal on exit_signal is sent as
process termination (SIGCHLD is the default). The 0 value is also valid,
meaning no signal will be sent.
The kernel already sets O_CLOEXEC as default and it follows the
fork/_Fork convention on returning a positive or negative value to the
parent (with negative indicating an error) and zero to the child.
Similar to fork, pidfd_fork also runs the pthread_atfork handlers It can
be changed by using the PIDFDFORK_ASYNCSAFE flag, which makes pidfd_fork
act a _Fork. It also sends SIGCHLD to the parent when the new process
terminates.
To have a way to interop between process IDs and process file
descriptors, the pidfd_getpid is also provided:
pid_t pidfd_getpid (int fd)
It reads the procfs fdinfo entry from the file descriptor to get the
process ID.
[1] https://gitlab.gnome.org/GNOME/glib/-/issues/1866
[2] https://sourceware.org/bugzilla/show_bug.cgi?id=30349
[3] https://codebrowser.dev/qt6/qtbase/src/3rdparty/forkfd/forkfd_linux.c.html
---
Changes from v7:
- Redefine __ASSUME_CLONE3 to 0 if the architecture does not support the
syscall.
- Fixed some failing errors to be reported by spawned processes.
- Fixed pre-commit CI for AArch64 failures.
- Rename pidfd_fork to fork_np and make the API extensible
- Document more possible pidfd_getpid errors.
Changes from v6:
- Rebased against master, adjusted symbol version and NEWS entry.
- Added arm/mips clone3 implementation.
Changes from v5:
- Added cgroupv2 support for posix_spawn, pidfd_spawn, and pidfd_fork.
Changes from v4:
- Changed pidfd_fork signature to return a pid_t instead of the PID file
descriptor.
- Changed pidfd_getpid to return EBADF for negative input, instead of
EINVAL.
- Added PIDFDFORK_NOSIGCHLD option.
- Fixed nested __BEGIN_DECLS on spawn.h
Changes from v3:
- Remove strtoul usage.
- Fixed patchwork tst-pidfd_getpid.c regression.
- Fixed manual and NEWS typos.
Changes from v2:
- Added pidfd_fork and pidfd_getpid manual entries
- Change pidfd_fork to act as fork as default, instead as _Fork.
- Changed PIDFD_FORK_RUNATFORK flag to PIDFDFORK_ASYNCSAFE.
- Added pidfd_getpid test for EREMOTE.
Changes from v1:
- Extended pidfd_getpid error codes to return EBADF if fdinfo does not
have Pid entry or if the value is invalid, EREMOTE is pid is in a
separate namespace, and ESRCH if is already terminated.
- Extended tst-pidfd_getpid.
- Rename PIDFD_FORK_RUNATFORK to PIDFDFORK_RUNATFORK to avoid clashes
with possible kernel extensions.
Adhemerval Zanella (7):
arm: Add the clone3 wrapper
mips: Add the clone3 wrapper
linux: Define __ASSUME_CLONE3 to 0 for alpha, ia64, nios2, sh, and
sparc
linux: Add posix_spawnattr_{get,set}cgroup_np (BZ 26731)
posix: Add pidfd_spawn and pidfd_spawnp (BZ 30349)
posix: Add fork_np (BZ 26371)
linux: Add pidfd_getpid
NEWS | 24 ++
bits/spawn_ext.h | 21 ++
include/clone_internal.h | 21 ++
manual/process.texi | 122 ++++++++-
posix/Makefile | 5 +-
posix/fork-internal.c | 127 ++++++++++
posix/fork-internal.h | 36 +++
posix/fork.c | 107 +-------
posix/spawn.h | 6 +-
posix/spawn_int.h | 3 +-
posix/spawnattr_setflags.c | 3 +-
posix/tst-posix_spawn-setsid.c | 169 +++++++++----
posix/tst-spawn-chdir.c | 15 +-
posix/tst-spawn.c | 24 +-
posix/tst-spawn.h | 36 +++
posix/tst-spawn2.c | 17 +-
posix/tst-spawn3.c | 100 ++++----
posix/tst-spawn4.c | 7 +-
posix/tst-spawn5.c | 14 +-
posix/tst-spawn6.c | 13 +-
posix/tst-spawn7.c | 13 +-
sysdeps/nptl/_Fork.c | 2 +-
sysdeps/unix/sysv/linux/Makefile | 29 +++
sysdeps/unix/sysv/linux/Versions | 8 +
sysdeps/unix/sysv/linux/aarch64/libc.abilist | 6 +
.../unix/sysv/linux/alpha/kernel-features.h | 4 +
sysdeps/unix/sysv/linux/alpha/libc.abilist | 6 +
sysdeps/unix/sysv/linux/arc/libc.abilist | 6 +
sysdeps/unix/sysv/linux/arch-fork.h | 16 +-
sysdeps/unix/sysv/linux/arm/be/libc.abilist | 6 +
sysdeps/unix/sysv/linux/arm/clone3.S | 80 ++++++
sysdeps/unix/sysv/linux/arm/le/libc.abilist | 6 +
sysdeps/unix/sysv/linux/arm/sysdep.h | 1 +
sysdeps/unix/sysv/linux/bits/spawn_ext.h | 71 ++++++
sysdeps/unix/sysv/linux/bits/unistd_ext.h | 51 ++++
sysdeps/unix/sysv/linux/clone-internal.c | 58 ++++-
sysdeps/unix/sysv/linux/clone-pidfd-support.c | 60 +++++
sysdeps/unix/sysv/linux/csky/libc.abilist | 6 +
sysdeps/unix/sysv/linux/fork_np.c | 97 +++++++
sysdeps/unix/sysv/linux/hppa/libc.abilist | 6 +
sysdeps/unix/sysv/linux/i386/libc.abilist | 6 +
.../unix/sysv/linux/ia64/kernel-features.h | 4 +
sysdeps/unix/sysv/linux/ia64/libc.abilist | 6 +
.../sysv/linux/loongarch/lp64/libc.abilist | 6 +
.../sysv/linux/m68k/coldfire/libc.abilist | 6 +
.../unix/sysv/linux/m68k/m680x0/libc.abilist | 6 +
.../sysv/linux/microblaze/be/libc.abilist | 6 +
.../sysv/linux/microblaze/le/libc.abilist | 6 +
sysdeps/unix/sysv/linux/mips/clone3.S | 139 +++++++++++
.../sysv/linux/mips/mips32/fpu/libc.abilist | 6 +
.../sysv/linux/mips/mips32/nofpu/libc.abilist | 6 +
.../sysv/linux/mips/mips64/n32/libc.abilist | 6 +
.../sysv/linux/mips/mips64/n64/libc.abilist | 6 +
sysdeps/unix/sysv/linux/mips/sysdep.h | 2 +
.../unix/sysv/linux/nios2/kernel-features.h | 24 ++
sysdeps/unix/sysv/linux/nios2/libc.abilist | 6 +
sysdeps/unix/sysv/linux/or1k/libc.abilist | 6 +
sysdeps/unix/sysv/linux/pidfd_getpid.c | 126 ++++++++++
sysdeps/unix/sysv/linux/pidfd_spawn.c | 30 +++
sysdeps/unix/sysv/linux/pidfd_spawnp.c | 30 +++
.../linux/powerpc/powerpc32/fpu/libc.abilist | 6 +
.../powerpc/powerpc32/nofpu/libc.abilist | 6 +
.../linux/powerpc/powerpc64/be/libc.abilist | 6 +
.../linux/powerpc/powerpc64/le/libc.abilist | 6 +
sysdeps/unix/sysv/linux/procutils.c | 97 +++++++
sysdeps/unix/sysv/linux/procutils.h | 43 ++++
.../unix/sysv/linux/riscv/rv32/libc.abilist | 6 +
.../unix/sysv/linux/riscv/rv64/libc.abilist | 6 +
.../unix/sysv/linux/s390/s390-32/libc.abilist | 6 +
.../unix/sysv/linux/s390/s390-64/libc.abilist | 6 +
sysdeps/unix/sysv/linux/sh/be/libc.abilist | 6 +
sysdeps/unix/sysv/linux/sh/kernel-features.h | 4 +
sysdeps/unix/sysv/linux/sh/le/libc.abilist | 6 +
.../unix/sysv/linux/sparc/kernel-features.h | 4 +
.../sysv/linux/sparc/sparc32/libc.abilist | 6 +
.../sysv/linux/sparc/sparc64/libc.abilist | 6 +
.../unix/sysv/linux/spawnattr_getcgroup_np.c | 28 +++
.../unix/sysv/linux/spawnattr_setcgroup_np.c | 27 ++
sysdeps/unix/sysv/linux/spawni.c | 42 +++-
sysdeps/unix/sysv/linux/sys/pidfd.h | 4 +
sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c | 170 +++++++++++++
sysdeps/unix/sysv/linux/tst-fork_np.c | 236 ++++++++++++++++++
sysdeps/unix/sysv/linux/tst-pidfd.c | 48 ++++
sysdeps/unix/sysv/linux/tst-pidfd_getpid.c | 126 ++++++++++
.../sysv/linux/tst-posix_spawn-setsid-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn-cgroup.c | 223 +++++++++++++++++
.../unix/sysv/linux/tst-spawn-chdir-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn-pidfd.h | 63 +++++
sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c | 20 ++
sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c | 20 ++
.../unix/sysv/linux/x86_64/64/libc.abilist | 6 +
.../unix/sysv/linux/x86_64/x32/libc.abilist | 6 +
97 files changed, 2947 insertions(+), 267 deletions(-)
create mode 100644 bits/spawn_ext.h
create mode 100644 posix/fork-internal.c
create mode 100644 posix/fork-internal.h
create mode 100644 posix/tst-spawn.h
create mode 100644 sysdeps/unix/sysv/linux/arm/clone3.S
create mode 100644 sysdeps/unix/sysv/linux/bits/spawn_ext.h
create mode 100644 sysdeps/unix/sysv/linux/clone-pidfd-support.c
create mode 100644 sysdeps/unix/sysv/linux/fork_np.c
create mode 100644 sysdeps/unix/sysv/linux/mips/clone3.S
create mode 100644 sysdeps/unix/sysv/linux/nios2/kernel-features.h
create mode 100644 sysdeps/unix/sysv/linux/pidfd_getpid.c
create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawn.c
create mode 100644 sysdeps/unix/sysv/linux/pidfd_spawnp.c
create mode 100644 sysdeps/unix/sysv/linux/procutils.c
create mode 100644 sysdeps/unix/sysv/linux/procutils.h
create mode 100644 sysdeps/unix/sysv/linux/spawnattr_getcgroup_np.c
create mode 100644 sysdeps/unix/sysv/linux/spawnattr_setcgroup_np.c
create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np-cgroup.c
create mode 100644 sysdeps/unix/sysv/linux/tst-fork_np.c
create mode 100644 sysdeps/unix/sysv/linux/tst-pidfd_getpid.c
create mode 100644 sysdeps/unix/sysv/linux/tst-posix_spawn-setsid-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-cgroup.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-chdir-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn-pidfd.h
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn2-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn3-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn4-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn5-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn6-pidfd.c
create mode 100644 sysdeps/unix/sysv/linux/tst-spawn7-pidfd.c
--
2.34.1
More information about the Libc-alpha
mailing list