[RFC] pthread support for FUTEX_WAIT_MULTIPLE
Pierre-Loup A. Griffais
pgriffais@valvesoftware.com
Wed Jul 31 22:00:00 GMT 2019
[replying to both inline below, let me know if just branching the thread
is generally preferable in the future]
On 7/31/19 1:14 AM, Florian Weimer wrote:> * Pierre-Loup A. Griffais:
>
>> The gist of it is that it adds a new function,
>> pthread_mutex_timedlock_any(), that can take a list of mutexes. It
>> returns when one of the mutex has been locked (and tells you which
>> one), or if the timeout is met. We would use this to reduce
>> unnecessary wakeups and spinning in our thread pool
>> synchronization code, compared to the current eventfd hack we rely
>> on.
>
> This explains why the mutexes have to be contiguous in memory. For
> other applications, this looks like an unnecessary restriction.
Agreed. Will switch the first argument to an array of pointers instead.
>
>> - I assume whichever name it might end up as should end with _np?
>> Is there any specific process to follow for this sort of
>> non-standard inclusion, other than just writing the code and
>> documentation?
>
> It seems unlikely that this is ever going to be standardized, so I
> think we'd need the _np suffix, yes.
Thanks for clarifying.
>
>> - What is the standard way for an application to discover whether
>> it can use an entrypoint dependent on a certain Linux kernel
>> version? With our proposed use, we'd be fine running the function
>> once at startup to pick which path to chose, eg.
>> pthread_mutex_lock_any( NULL, 0, NULL, NULL ). If it returns 0,
>> we'd enable the new path, otherwise we'd fall back to eventfd(). I
>> have a TODO in the code where we could do that, but it's probably
>> not the right way to do things.
>
> I think you would have to probe on first use inside
> pthread_mutex_lock_any, using a dummy call.
OK, that was my original plan, I can finish writing that up.
>
>> - I assume the way I'm exposing it as a 2.2.5-versioned symbol for
>> local testing is wrong; what is the right way to do this?
>
> This patch could be targeted at glibc 2.31, then you would have to
> use GLIBC_2.31.
>
>> - In my tree I have a placeholder test application that should be
>> replaced by a new mutex test. However, it would also be a good idea
>> to leverage other mutex tests to test this new thing, since it's a
>> superset of many other mutex calls. Could we make the normal mutex
>> test suite run a second time, with a macro wrapping the normal
>> pthread_lock with this implementation instead?
>
> Due to the new ENOMEM error, the new function is not a strict
> superset.
True, thanks for pointing that out. Any sort of attempted wrapping would
have to account for that somehow.
>
> I'm wondering if the current design is really the right one,
> particularly for thread pools. The implementation performs multiple
> scans of the mutex lists, which look rather costly for large pools.
> That's probably unavoidable if the list is dynamic and potentially
> different for every call, but given the array-based interface, I
> don't think this is the intended use. Something that use
> pre-registration of the participating futexes could avoid that. I
> also find it difficult to believe that this approach beats something
> that involves queues, where a worker thread that becomes available
> identifies itself directly to a submitter thread, or can directly
> consume the submitted work item.
That might be interesting if walking the lists turns out to be a
hotspot. I think the mutex count per operation would typically not be
thousands, and probably not hundreds either.
I would think there's still a queue somewhere to acquire jobs, this
would be used before and after. For instance, job threads want to sleep
until work has been queued, or another system event occurs that might
require them to wake up, like app shutdown or scene transition.
Similarly, after firing off N jobs, the job manager will want to sleep
until one of the jobs is complete to perform some accounting and publish
the results to other systems. For both of these usecases, using eventfd
to wait for multiple events seems to result in more CPU spinning than
the futex-based solution, both in userspace and the kernel.
>
> Thanks, Florian
>
On 7/31/19 3:01 AM, Szabolcs Nagy wrote:
> On 31/07/2019 01:07, Pierre-Loup A. Griffais wrote:
>> I started putting together a patch to expose the new Linux futex
>> functionality that recently got proposed for upstream inclusion.
>> [1]
> ...
>>
>> [1] https://lkml.org/lkml/2019/7/30/1399
>
> i don't see that patch on the linux-api list where userspace api
> related patches are discussed.
>
> syscalls that have time argument need extra care now that 32bit
> targets will get a new 64bit time_t abi.
Thanks, looks like there were compat concerns raised on the kernel side
as well; we can copy linux-api for the next patch iteration.
>
> the futex syscall is multiplexed and intricately related to the
> pthread implementation so there are many reasons why such patch
> should not be accepted into linux before agreement with userspace.
What does that process typically look like, other than raising it on
both ends like we did?
Thanks for all the feedback!
- Pierre-Loup
>
More information about the Libc-alpha
mailing list