This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] NUMA spinlock [BZ #23962]



On 2018/12/26 上午10:50, Ma Ling wrote:
> From: "ling.ma" <ling.ml@antfin.com>
> 
> On multi-socket systems, memory is shared across the entire system.
> Data access to the local socket is much faster than the remote socket
> and data access to the local core is faster than sibling cores on the
> same socket.  For serialized workloads with conventional spinlock,
> when there is high spinlock contention between threads, lock ping-pong
> among sockets becomes the bottleneck and threads spend majority of
> their time in spinlock overhead.
> 
> On multi-socket systems, the keys to our NUMA spinlock performance
> are to minimize cross-socket traffic as well as localize the serialized
> workload to one core for execution.  The basic principles of NUMA
> spinlock are mainly consisted of following approaches, which reduce
> data movement and accelerate critical section, eventually give us
> significant performance improvement.
> 
> 1. MCS spinlock
> MCS spinlock help us to reduce the useless lock movement in the
> spinning state.  This paper provides a good description for this
> kind of lock:

That's not the truth.
No matter generic spinlock(or x86 version) spinlock has used the way of 
test and test_and_set to reduce the useless lock movement in the spinning 
state.

See
glibc/nptl/pthread_spin_lock.c
glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S

What MCS-spinlock really helps is to accelerate lock release and lock acquisition
by reducing lots of cache line bouncing.

> NUMA spinlock can greatly speed up critical section on multi-socket
> systems.  It should improve spinlock performance on all multi-socket
> systems. 
> 

This is out-of-question that NUMA spinlock helps a lot in case of heavy lock
contention. But, we should also propose the data for non-contented case and slight
contended case.

It's expected that extra code complexity may degrade lock performance a bit for
slight contended case, I would like to see the data for that.

Also, the lock starvation would be possible if the running core is always busy with
heavy lock contention. More explanation is expected.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]