This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] NUMA spinlock [BZ #23962]
- From: kemi <kemi dot wang at intel dot com>
- To: Ma Ling <ling dot ma dot program at gmail dot com>, libc-alpha at sourceware dot org
- Cc: hongjiu dot lu at intel dot com, "ling.ma" <ling dot ml at antfin dot com>, Wei Xiao <wei3 dot xiao at intel dot com>
- Date: Tue, 15 Jan 2019 10:52:18 +0800
- Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
- References: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local>
On 2018/12/26 上午10:50, Ma Ling wrote:
> From: "ling.ma" <ling.ml@antfin.com>
>
> On multi-socket systems, memory is shared across the entire system.
> Data access to the local socket is much faster than the remote socket
> and data access to the local core is faster than sibling cores on the
> same socket. For serialized workloads with conventional spinlock,
> when there is high spinlock contention between threads, lock ping-pong
> among sockets becomes the bottleneck and threads spend majority of
> their time in spinlock overhead.
>
> On multi-socket systems, the keys to our NUMA spinlock performance
> are to minimize cross-socket traffic as well as localize the serialized
> workload to one core for execution. The basic principles of NUMA
> spinlock are mainly consisted of following approaches, which reduce
> data movement and accelerate critical section, eventually give us
> significant performance improvement.
>
> 1. MCS spinlock
> MCS spinlock help us to reduce the useless lock movement in the
> spinning state. This paper provides a good description for this
> kind of lock:
That's not the truth.
No matter generic spinlock(or x86 version) spinlock has used the way of
test and test_and_set to reduce the useless lock movement in the spinning
state.
See
glibc/nptl/pthread_spin_lock.c
glibc/sysdeps/x86_64/nptl/pthread_spin_lock.S
What MCS-spinlock really helps is to accelerate lock release and lock acquisition
by reducing lots of cache line bouncing.
> NUMA spinlock can greatly speed up critical section on multi-socket
> systems. It should improve spinlock performance on all multi-socket
> systems.
>
This is out-of-question that NUMA spinlock helps a lot in case of heavy lock
contention. But, we should also propose the data for non-contented case and slight
contended case.
It's expected that extra code complexity may degrade lock performance a bit for
slight contended case, I would like to see the data for that.
Also, the lock starvation would be possible if the running core is always busy with
heavy lock contention. More explanation is expected.