This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] NUMA spinlock [BZ #23962]
- From: "马凌(彦军)" <ling dot ml at antfin dot com>
- To: <fweimer at redhat dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, "Xiao, Wei3" <wei3 dot xiao at intel dot com>, nd <nd at arm dot com>, "ling.ma.program" <ling dot ma dot program at gmail dot com>
- Date: Thu, 10 Jan 2019 21:19:28 +0800
- Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
Hi Florian,
Thanks for your comments!
We test numa spinlock on 2s-Kunpeng 920 platform ,128 physical arm cores with 256G RAM as below.
$./tst-variable-overhead
Number of processors: 128, Single thread time 11657100
Number of threads: 2, Total time 33449020, Overhead: 1.43
Number of threads: 4, Total time 135449160, Overhead: 2.90
Number of threads: 8, Total time 1146508900, Overhead: 12.29
Number of threads: 16, Total time 6725395660, Overhead: 36.06
Number of threads: 32, Total time 37197114800, Overhead: 99.72
Number of threads: 64, Total time 501098134360, Overhead: 671.66
Number of threads: 128, Total time 2588795930500, Overhead: 1734.99
Number of threads: 256, Total time 14987969840860, Overhead: 5022.41
Number of threads: 384, Total time 31444706737160, Overhead: 7024.67
Number of threads: 512, Total time 60079858502060, Overhead: 10066.27
2. numa spinlock
$./tst-numa-variable-overhead
Number of processors: 128, Single thread time 12647780
Number of threads: 2, Total time 36606840, Overhead: 1.45
Number of threads: 4, Total time 115740060, Overhead: 2.29
Number of threads: 8, Total time 604662840, Overhead: 5.98
Number of threads: 16, Total time 2285066760, Overhead: 11.29
Number of threads: 32, Total time 8533264240, Overhead: 21.08
Number of threads: 64, Total time 72671073600, Overhead: 89.78
Number of threads: 128, Total time 287805932560, Overhead: 177.78
Number of threads: 256, Total time 837367226760, Overhead: 258.62
Number of threads: 384, Total time 1954243727660, Overhead: 402.38
Number of threads: 512, Total time 3523015939200, Overhead: 544.04
The above data tell us the numa spinlock improve performance upto 17X with 512 threads on arm platform.
And numa spinlock should improve spinlock performance on all multi-socket systems.
Thanks
Ling
在 2019/1/4 上午3:59,“H.J. Lu”<hjl.tools@gmail.com> 写入:
On Thu, Jan 3, 2019 at 6:52 AM Szabolcs Nagy <Szabolcs.Nagy@arm.com> wrote:
>
> On 03/01/2019 05:35, 马凌(彦军) wrote:
> > create mode 100644 manual/examples/numa-spinlock.c
> > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock-private.h
> > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.c
> > create mode 100644 sysdeps/unix/sysv/linux/numa-spinlock.h
> > create mode 100644 sysdeps/unix/sysv/linux/numa_spinlock_alloc.c
> > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-numa-variable-overhead.c
> > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-overhead-skeleton.c
> > create mode 100644 sysdeps/unix/sysv/linux/x86/tst-variable-overhead.c
>
> as far as i can tell the new code is generic
> (other than the presence of efficient getcpu),
> so i think the test should be generic too.
>
> > --- /dev/null
> > +++ b/sysdeps/unix/sysv/linux/x86/tst-variable-overhead-skeleton.c
> > @@ -0,0 +1,384 @@
> ...
> > +/* Check spinlock overhead with large number threads. Critical region is
> > + very smmall. Critical region + spinlock overhead aren't noticeable
> > + when number of threads is small. When thread number increases,
> > + spinlock overhead become the bottleneck. It shows up in wall time
> > + of thread execution. */
>
> yeah, this is not easy to do in a generic way, i think
> even on x86 such measurement is problematic, you don't
> know what goes on a system (or vm) when the glibc test
> is running.
>
> but doing precise timing is not that important for
> checking the correctness of the locks, so i think a
> simplified version can be generic test code.
Here is the updated patch to make tests generic.
--
H.J.