This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [RFC] pthread support for FUTEX_WAIT_MULTIPLE

On 8/1/19 2:39 AM, Florian Weimer wrote:
* Pierre-Loup A. Griffais:

I would think there's still a queue somewhere to acquire jobs, this
would be used before and after. For instance, job threads want to
sleep until work has been queued, or another system event occurs that
might require them to wake up, like app shutdown or scene
transition. Similarly, after firing off N jobs, the job manager will
want to sleep until one of the jobs is complete to perform some
accounting and publish the results to other systems. For both of these
usecases, using eventfd to wait for multiple events seems to result in
more CPU spinning than the futex-based solution, both in userspace and
the kernel.

Why do you consider eventfd the only viable alternative?  If you want a
futex-based solution today, you can use condition variables.  It won't
give you the theoretical minimum of context switches, but neither does
FUTEX_WAIT_MULTIPLE, as far as I can tell.

I think there's two main aspects to this, one is purely technical and I can try to speak to it a bit below:

Unless I'm missing something, in the scenario where I have N (where N is probably close to the CPU count on the machine) job threads processing work and needing to report back to a job system, then promptly get back to work, wouldn't trying to implement this with condition variables introduce unwanted contention between the job threads at reporting that wouldn't exist otherwise? Or, unwanted spinning on the job manager side.

So I do think it would give us quite a bit more potential context switches and overhead, which we're trying to reduce. Spinning more for the same latency is not desirable for gaming, as power usage also factors in.

When you say the proposed futex approach is still not necessarily optimal, do you mean compared to a complete lockless design on the app side? Do you agree it would still be more efficient than using cond vars? (and that would require some redesign work on the app side as well)

The other aspect is that we're dealing with a bunch of applications that have their threading model already defined by the time they target Linux. They can use WaitForMultiple() on Windows and kqueue on macOS; they typically opt for lower-performance "emulation" of the desired behavior using multiple mutexes, condition variables, or eventfd. I think this new primitive would let them quickly port to something that is equally as efficient, or more efficient, than their starting point.

 - Pierre-Loup


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]