This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[RFC] pthread support for FUTEX_WAIT_MULTIPLE
- From: "Pierre-Loup A. Griffais" <pgriffais at valvesoftware dot com>
- To: <libc-alpha at sourceware dot org>
- Date: Tue, 30 Jul 2019 17:07:25 -0700
- Subject: [RFC] pthread support for FUTEX_WAIT_MULTIPLE
Hello,
I started putting together a patch to expose the new Linux futex
functionality that recently got proposed for upstream inclusion. [1]
I pushed the patch to a repository [2] and am also including it in this
email for early comments and feedback.
Since I'm so unfamiliar with the generally accepted practice, I've held
on finishing it until I could get some feedback on a variety of points
I'm unsure about.
The gist of it is that it adds a new function,
pthread_mutex_timedlock_any(), that can take a list of mutexes. It
returns when one of the mutex has been locked (and tells you which one),
or if the timeout is met. We would use this to reduce unnecessary
wakeups and spinning in our thread pool synchronization code, compared
to the current eventfd hack we rely on.
All mutexes passed to the list need to have matching types/kinds, as the
kernel interface can only accept one PRIVATE flag for the whole operation.
Here are a grab bag of my questions, please bear with me:
- I assume whichever name it might end up as should end with _np? Is
there any specific process to follow for this sort of non-standard
inclusion, other than just writing the code and documentation?
- What is the standard way for an application to discover whether it
can use an entrypoint dependent on a certain Linux kernel version? With
our proposed use, we'd be fine running the function once at startup to
pick which path to chose, eg. pthread_mutex_lock_any( NULL, 0, NULL,
NULL ). If it returns 0, we'd enable the new path, otherwise we'd fall
back to eventfd(). I have a TODO in the code where we could do that, but
it's probably not the right way to do things.
- I assume the way I'm exposing it as a 2.2.5-versioned symbol for
local testing is wrong; what is the right way to do this?
- I was planning on implementing the various different mutex types
with branches, and possibly try to have a couple of callsites to a
version that takes flags and rely on compiler inlining possibly doing
the right thing. I see other implementations of similar functions opt
for much more explicit inlining based on the mutex type. Would a version
with branches be acceptable initially? We could further work on
inlining/tuning if we found hotspots in the future.
- In my tree I have a placeholder test application that should be
replaced by a new mutex test. However, it would also be a good idea to
leverage other mutex tests to test this new thing, since it's a superset
of many other mutex calls. Could we make the normal mutex test suite run
a second time, with a macro wrapping the normal pthread_lock with this
implementation instead?
Thanks,
- Pierre-Loup
[1] https://lkml.org/lkml/2019/7/30/1399
[2]
https://github.com/Plagman/glibc/commit/3b01145fa25987f2f93e7eda7f3e7d0f2f77b290
From 3b01145fa25987f2f93e7eda7f3e7d0f2f77b290 Mon Sep 17 00:00:00 2001
From: "Pierre-Loup A. Griffais" <pgriffais@valvesoftware.com>
Date: Wed, 10 Jul 2019 19:47:52 -0700
Subject: [PATCH 1/2] Implement pthread_mutex_lock_any() and
pthread_mutex_timedlock_any().
On newer Linux kernels, futex supports a WAIT_MULTIPLE operation that can be
used to implement new locking functionality that allows a given application
thread to sleep while waiting for one of multiple given locks to become
available.
---
nptl/Makefile | 1 +
nptl/Versions | 2 +
nptl/pthreadP.h | 5 +
nptl/pthread_mutex_lock_any.c | 37 ++++
nptl/pthread_mutex_timedlock_any.c | 193 ++++++++++++++++++
sysdeps/nptl/pthread.h | 10 +
sysdeps/unix/sysv/linux/lowlevellock-futex.h | 15 ++
.../sysv/linux/x86_64/64/libpthread.abilist | 1 +
8 files changed, 264 insertions(+)
create mode 100644 nptl/pthread_mutex_lock_any.c
create mode 100644 nptl/pthread_mutex_timedlock_any.c
diff --git a/nptl/Makefile b/nptl/Makefile
index 0567e77a790..990315c292f 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -68,6 +68,7 @@ libpthread-routines = nptl-init nptlfreeres vars
events version pt-interp \
pthread_getattr_np \
pthread_mutex_init pthread_mutex_destroy \
pthread_mutex_lock pthread_mutex_trylock \
+ pthread_mutex_lock_any pthread_mutex_timedlock_any \
pthread_mutex_timedlock pthread_mutex_unlock \
pthread_mutex_cond_lock \
pthread_mutexattr_init pthread_mutexattr_destroy \
diff --git a/nptl/Versions b/nptl/Versions
index f2ea2b32a1b..1ec7ffffeb6 100644
--- a/nptl/Versions
+++ b/nptl/Versions
@@ -274,6 +274,8 @@ libpthread {
mtx_init; mtx_lock; mtx_timedlock; mtx_trylock; mtx_unlock;
mtx_destroy;
call_once; cnd_broadcast; cnd_destroy; cnd_init; cnd_signal;
cnd_timedwait; cnd_wait; tss_create; tss_delete; tss_get; tss_set;
+
+ pthread_mutex_lock_any; pthread_mutex_timedlock_any;
}
GLIBC_2.30 {
diff --git a/nptl/pthreadP.h b/nptl/pthreadP.h
index d80662ab07b..d6e89c0a53d 100644
--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
@@ -390,6 +390,11 @@ extern int __pthread_mutex_trylock (pthread_mutex_t
*_mutex);
extern int __pthread_mutex_lock (pthread_mutex_t *__mutex);
extern int __pthread_mutex_timedlock (pthread_mutex_t *__mutex,
const struct timespec *__abstime);
+extern int __pthread_mutex_lock_any (pthread_mutex_t *__mutex, int
mutexcount,
+ int *outlocked);
+extern int __pthread_mutex_timedlock_any (pthread_mutex_t *__mutex, int
count,
+ const struct timespec *__abstime,
+ int *outlocked);
extern int __pthread_mutex_cond_lock (pthread_mutex_t *__mutex)
attribute_hidden;
extern void __pthread_mutex_cond_lock_adjust (pthread_mutex_t *__mutex)
diff --git a/nptl/pthread_mutex_lock_any.c b/nptl/pthread_mutex_lock_any.c
new file mode 100644
index 00000000000..485c213a172
--- /dev/null
+++ b/nptl/pthread_mutex_lock_any.c
@@ -0,0 +1,37 @@
+/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/param.h>
+#include <not-cancel.h>
+#include "pthreadP.h"
+#include <atomic.h>
+#include <lowlevellock.h>
+#include <stap-probe.h>
+
+int
+__pthread_mutex_lock_any (pthread_mutex_t *mutexlist, int mutexcount,
+ int *outlocked)
+{
+ return __pthread_mutex_timedlock_any(mutexlist, mutexcount, NULL,
outlocked);
+}
+
+weak_alias (__pthread_mutex_lock_any, pthread_mutex_lock_any)
diff --git a/nptl/pthread_mutex_timedlock_any.c
b/nptl/pthread_mutex_timedlock_any.c
new file mode 100644
index 00000000000..a95687ce8e1
--- /dev/null
+++ b/nptl/pthread_mutex_timedlock_any.c
@@ -0,0 +1,193 @@
+/* Copyright (C) 2002-2019 Free Software Foundation, Inc.
+ This file is part of the GNU C Library.
+ Contributed by Ulrich Drepper <drepper@redhat.com>, 2002.
+
+ The GNU C Library is free software; you can redistribute it and/or
+ modify it under the terms of the GNU Lesser General Public
+ License as published by the Free Software Foundation; either
+ version 2.1 of the License, or (at your option) any later version.
+
+ The GNU C Library is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ Lesser General Public License for more details.
+
+ You should have received a copy of the GNU Lesser General Public
+ License along with the GNU C Library; if not, see
+ <http://www.gnu.org/licenses/>. */
+
+#include <assert.h>
+#include <errno.h>
+#include <stdlib.h>
+#include <unistd.h>
+#include <sys/param.h>
+#include <not-cancel.h>
+#include "pthreadP.h"
+#include <atomic.h>
+#include <lowlevellock.h>
+#include <stap-probe.h>
+
+/* TODO: this probably comes from a kernel header when upstream? */
+struct futex_wait_block {
+ int *uaddr;
+ int val;
+ int bitset;
+} __attribute__((packed));
+
+int
+__pthread_mutex_timedlock_any (pthread_mutex_t *mutexlist, int mutexcount,
+ const struct timespec *abstime, int *outlocked)
+{
+ /* This requires futex support */
+#ifndef __NR_futex
+ return ENOTSUP;
+#endif
+
+ if (mutexlist == NULL)
+ {
+ /* User is asking us if kernel supports the feature. */
+
+ /* TODO: how does one check if supported?
+ * I was thinking of trying the ioctl once and then returning the
static
+ * cached value, is that OK?
+ */
+ return 0;
+ }
+
+ if (mutexlist != NULL && mutexcount <= 0)
+ return EINVAL;
+
+ if (outlocked == NULL)
+ return EINVAL;
+
+ int type = PTHREAD_MUTEX_TYPE (mutexlist);
+
+ for (int i = 1; i < mutexcount; i++)
+ {
+ /* Types have to match, since the PRIVATE flag is OP-global. */
+ if (PTHREAD_MUTEX_TYPE (&mutexlist[i]) != type)
+ return EINVAL;
+ }
+
+ int kind = type & PTHREAD_MUTEX_KIND_MASK_NP;
+
+ /* TODO: implement recursive, errorcheck and adaptive. */
+ if (kind != PTHREAD_MUTEX_NORMAL)
+ return EINVAL;
+
+ /* TODO: implement robust. */
+ if (type & PTHREAD_MUTEX_ROBUST_NORMAL_NP)
+ return EINVAL;
+
+ /* TODO: implement PI. */
+ if (type & PTHREAD_MUTEX_PRIO_INHERIT_NP)
+ return EINVAL;
+
+ /* TODO: implement PP. */
+ if (type & PTHREAD_MUTEX_PRIO_PROTECT_NP)
+ return EINVAL;
+
+ pid_t id = THREAD_GETMEM (THREAD_SELF, tid);
+ int result;
+
+ result = -1;
+
+ for (int i = 0; i < mutexcount; i++)
+ {
+ if (__lll_trylock (&mutexlist[i].__data.__lock) == 0)
+ {
+ result = i;
+ break;
+ }
+ }
+
+ while (result == -1)
+ {
+ for (int i = 0; i < mutexcount; i++)
+ {
+ int oldval = atomic_exchange_acq (&mutexlist[i].__data.__lock, 2);
+
+ if (oldval == 0)
+ {
+ result = i;
+ break;
+ }
+ }
+
+ if (result == -1)
+ {
+ /* Couldn't get one of the locks immediately, we have to sleep
now. */
+ struct timespec *timeout = NULL;
+ struct timespec rt;
+
+ if (abstime != NULL)
+ {
+ /* Reject invalid timeouts. */
+ if (abstime->tv_nsec < 0 || abstime->tv_nsec >= 1000000000)
+ return EINVAL;
+
+ struct timeval tv;
+
+ /* Get the current time. */
+ (void) __gettimeofday (&tv, NULL);
+
+ /* Compute relative timeout. */
+ rt.tv_sec = abstime->tv_sec - tv.tv_sec;
+ rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000;
+ if (rt.tv_nsec < 0)
+ {
+ rt.tv_nsec += 1000000000;
+ --rt.tv_sec;
+ }
+
+ if (rt.tv_sec < 0)
+ return ETIMEDOUT;
+
+ timeout = &rt;
+ }
+
+ struct futex_wait_block waitblock[mutexcount];
+
+ for (int i = 0; i < mutexcount; i++)
+ {
+ waitblock[i].uaddr = &mutexlist[i].__data.__lock;
+ waitblock[i].val = 2;
+ waitblock[i].bitset = ~0;
+ }
+
+ long int __ret;
+
+ /* Safe to use the flag for the first one, since all their types
match. */
+ int private_flag = PTHREAD_MUTEX_PSHARED (&mutexlist[0]);
+
+ __ret = lll_futex_timed_wait_multiple (waitblock, mutexcount,
timeout,
+ private_flag);
+
+ if (__ret < 0)
+ return -__ret; /* TODO is this correct? */
+
+ /* Have slept, try grabbing the one that woke us up? */
+ if (atomic_exchange_acq (&mutexlist[__ret].__data.__lock, 2) == 0)
+ {
+ /* We got it, done, loop will end below. */
+ result = __ret;
+ }
+ }
+ }
+
+ if (result != -1)
+ {
+ /* Record the ownership. */
+ mutexlist[result].__data.__owner = id;
+ ++mutexlist[result].__data.__nusers;
+
+ /* Let the user know which mutex is now locked. */
+ *outlocked = result;
+
+ result = 0;
+ }
+
+ return result;
+}
+
+weak_alias (__pthread_mutex_timedlock_any, pthread_mutex_timedlock_any)
diff --git a/sysdeps/nptl/pthread.h b/sysdeps/nptl/pthread.h
index a767d6f1808..d9e500f9877 100644
--- a/sysdeps/nptl/pthread.h
+++ b/sysdeps/nptl/pthread.h
@@ -763,7 +763,17 @@ extern int pthread_mutex_trylock (pthread_mutex_t
*__mutex)
extern int pthread_mutex_lock (pthread_mutex_t *__mutex)
__THROWNL __nonnull ((1));
+/* Lock any one of several mutexes. */
+extern int pthread_mutex_lock_any (pthread_mutex_t *__mutexlist,
+ int mutexcount, int *outlocked);
+
#ifdef __USE_XOPEN2K
+/* Lock any one of several mutexes, with timeout. */
+extern int pthread_mutex_timedlock_any (pthread_mutex_t *__mutexlist,
+ int mutexcount,
+ const struct timespec *__restrict
+ __abstime, int *outlocked);
+
/* Wait until lock becomes available, or specified time passes. */
extern int pthread_mutex_timedlock (pthread_mutex_t *__restrict __mutex,
const struct timespec *__restrict
diff --git a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
index cfa796be2bd..28140721648 100644
--- a/sysdeps/unix/sysv/linux/lowlevellock-futex.h
+++ b/sysdeps/unix/sysv/linux/lowlevellock-futex.h
@@ -38,6 +38,7 @@
#define FUTEX_WAKE_BITSET 10
#define FUTEX_WAIT_REQUEUE_PI 11
#define FUTEX_CMP_REQUEUE_PI 12
+#define FUTEX_WAIT_MULTIPLE 13
#define FUTEX_PRIVATE_FLAG 128
#define FUTEX_CLOCK_REALTIME 256
@@ -74,6 +75,15 @@
? -INTERNAL_SYSCALL_ERRNO (__ret, __err) : 0); \
})
+#define lll_futex_syscall_ret(nargs, futexp, op, ...) \
+ ({ \
+ INTERNAL_SYSCALL_DECL (__err); \
+ long int __ret = INTERNAL_SYSCALL (futex, __err, nargs, futexp, op, \
+ __VA_ARGS__); \
+ (__glibc_unlikely (INTERNAL_SYSCALL_ERROR_P (__ret, __err)) \
+ ? -INTERNAL_SYSCALL_ERRNO (__ret, __err) : __ret); \
+ })
+
#define lll_futex_wait(futexp, val, private) \
lll_futex_timed_wait (futexp, val, NULL, private)
@@ -82,6 +92,11 @@
__lll_private_flag (FUTEX_WAIT, private), \
val, timeout)
+#define lll_futex_timed_wait_multiple(futexp, val, timeout, private) \
+ lll_futex_syscall_ret (4, futexp, \
+ __lll_private_flag (FUTEX_WAIT_MULTIPLE, private), \
+ val, timeout)
+
/* Verify whether the supplied clockid is supported by
lll_futex_clock_wait_bitset. */
#define lll_futex_supported_clockid(clockid) \
diff --git a/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
b/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
index 297fec9686d..f9bf1a40509 100644
--- a/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
+++ b/sysdeps/unix/sysv/linux/x86_64/64/libpthread.abilist
@@ -126,6 +126,7 @@ GLIBC_2.2.5 pthread_kill_other_threads_np F
GLIBC_2.2.5 pthread_mutex_destroy F
GLIBC_2.2.5 pthread_mutex_init F
GLIBC_2.2.5 pthread_mutex_lock F
+GLIBC_2.2.5 pthread_mutex_lock_any F
GLIBC_2.2.5 pthread_mutex_timedlock F
GLIBC_2.2.5 pthread_mutex_trylock F
GLIBC_2.2.5 pthread_mutex_unlock F
--
2.17.1