This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] NUMA spinlock [BZ #23962]
- From: Torvald Riegel <triegel at redhat dot com>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>, Rich Felker <dalias at libc dot org>
- Cc: Ma Ling <ling dot ma dot program at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, "Lu, Hongjiu" <hongjiu dot lu at intel dot com>, "ling.ma" <ling dot ml at antfin dot com>, Wei Xiao <wei3 dot xiao at intel dot com>
- Date: Mon, 14 Jan 2019 23:40:08 +0100
- Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
- References: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local> <20190103204338.GU23599@brightrain.aerifal.cx> <CAMe9rOoBhZmzuEoPGjhbxkYZ3mOaC-8tPUrSuhmcbnu8J19LpA@mail.gmail.com>
On Thu, 2019-01-03 at 12:54 -0800, H.J. Lu wrote:
> On Thu, Jan 3, 2019 at 12:43 PM Rich Felker <dalias@libc.org> wrote:
> >
> > On Wed, Dec 26, 2018 at 10:50:19AM +0800, Ma Ling wrote:
> > > From: "ling.ma" <ling.ml@antfin.com>
> > >
> > > On multi-socket systems, memory is shared across the entire system.
> > > Data access to the local socket is much faster than the remote socket
> > > and data access to the local core is faster than sibling cores on the
> > > same socket. For serialized workloads with conventional spinlock,
> > > when there is high spinlock contention between threads, lock ping-pong
> > > among sockets becomes the bottleneck and threads spend majority of
> > > their time in spinlock overhead.
> > >
> > > On multi-socket systems, the keys to our NUMA spinlock performance
> > > are to minimize cross-socket traffic as well as localize the serialized
> > > workload to one core for execution. The basic principles of NUMA
> > > spinlock are mainly consisted of following approaches, which reduce
> > > data movement and accelerate critical section, eventually give us
> > > significant performance improvement.
> >
> > I question whether this belongs in glibc. It seems highly application-
> > and kernel-specific. Is there a reason you wouldn't prefer to
> > implement and maintain it in a library for use in the kind of
> > application that needs it?
>
> This is a good question. On the other hand, the current spinlock
> in glibc hasn't been changed for many years. It doesn't scale for
> today's hardware. Having a scalable spinlock in glibc is desirable.
I agree the spinlocks need to improve, but let's do first things first: add
proper back-off. The biggest problem there is find a way to select and
maintain the tuning values that is simple for the glibc developers; there
should be good benchmarks that can be used to automatically check that the
tuning values make sense, and adaptation at runtime would be even better
(if it can be shown to improve performance).
There are also several other synchronization algorithms in glibc where
proper (limited) spinning and back-off would help. Look for comments such
as "TODO Back-off." throughout the code, in particular the synchronization
code I have rewritten in the past. And this applies to the normal mutexes
too, obviously.