This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] NUMA spinlock [BZ #23962]

From: kemi <kemi dot wang at intel dot com>
To: Carlos O'Donell <carlos at redhat dot com>, Szabolcs Nagy <Szabolcs dot Nagy at arm dot com>, Florian Weimer <fweimer at redhat dot com>
Cc: nd <nd at arm dot com>, "H.J. Lu" <hjl dot tools at gmail dot com>, libc-alpha <libc-alpha at sourceware dot org>
Date: Fri, 11 Jan 2019 19:56:55 +0800
Subject: Re: [PATCH] NUMA spinlock [BZ #23962]
References: <20181226025019.38752-1-ling.ma@MacBook-Pro-8.local> <7D8A82D6-6F0A-4860-856A-EB0C8CD13E9C@antfin.com> <0a474516-b8c8-48cf-aeea-e57c77b78cbd.ling.ml@antfin.com> <c7f11fea-371e-4453-b5d0-9b142632aecc.ling.ml@antfin.com> <8c67f319-31bf-818b-4a89-66d25328026e@arm.com> <CAMe9rOrU1niqVofiFvpgvMNrUohu0yW--OBTHh3TDC-3fnG51Q@mail.gmail.com> <d4b9e25e-8884-b01f-040e-b6f40e011033@redhat.com> <CAMe9rOr9yeey0H2R1H6LRQxiFJ1Eu2fc+sqs+J2t1aaJBFcC5g@mail.gmail.com> <87ef9oe0zf.fsf@oldenburg2.str.redhat.com> <CAMe9rOqJu8gmrjZd10_YU7bo4gUGOETYvsE4mGd=vZK2hH4_Dg@mail.gmail.com> <97275e72-dc1b-fc2a-d908-b94ef7064ed8@redhat.com> <87y37siic1.fsf@oldenburg2.str.redhat.com> <479415e1-3aba-b456-fc67-0614fbd462a1@redhat.com> <60bfc7d4-aa76-e5bf-1d1a-fd702d579128@arm.com> <999d5a20-288d-ab46-954f-b13e331ca317@redhat.com>

On 2019/1/11 上午3:24, Carlos O'Donell wrote:
> On 1/10/19 12:52 PM, Szabolcs Nagy wrote:
>> On 10/01/2019 16:41, Carlos O'Donell wrote:
>>> On 1/10/19 11:32 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>> My opinion is that for the health and evolution of a NUMA-aware spinlock
>>>>> and MCS lock, that we should create a distinct project and library that
>>>>> should have those locks, and then work to put them into downstream
>>>>> distributions. This will support key users being able to use supported
>>>>> versions of those libraries, and give the needed feedback about the API
>>>>> and the performance. It may take 1-2 years to get that feedback and every
>>>>> piece of feedback will improve the final API/ABI we put into glibc or
>>>>> even into the next ISO C standard as pat of the C thread interface.
>>>>
>>>> I think it's something taht could land in tbb, for which many
>>>> distributions already have mechanisms to ship updated versions after a
>>>> release.
>>>
>>> Absolutely. That's a great idea.
>>>
>>
>> in principle the pthread_spin_lock api can use this algorithm
>> assuming we can keep the pthread_spinlock_t abi and keep the
>> POSIX semantics. (presumably users ran into issues with the
>> existing posix api.. or how did this come up in the first place?)
>  
> Correct, but meeting the ABI contract of the pthread_spinlck_t turns
> out to be hard, there isn't much space. I've spoken with Kemi Wang 
> (Intel) about this specific issue, and he has some ideas to share,
> but I'll leave it for him to describe.
> 

It may be possible because we can make better use of size of pthread_spinlock_t.

MCS lock is a well known method to reduce spinlock overhead by queuing spinner, the spinlock 
cache line is only contended between spinlock holder and a active spinner, other spinners are
spinning on local-accessible flag until the previous spinner pass mcs lock holder down.

Usually, a classical MCS implementation requires an extra pointer *mcs_lock* to track the tail of queue.
When a new spinner is adding into the queue, we first get the current tail of queue, and move the mcs_lock
pointer to point to this new spinner(a new tail of queue). 
If we can squeeze some space in pthread_spinlock_t to store this tail info, and update this tail info
when a new spinner is added into the queue, then the MCS algorithm can be reimplemented without breaking ABI.
That's possible because *lock* itself don't have to occupy 32 bits (8 bits or even one bit should be enough).

Then the pthread_spinlock_t structure may be like this(Similar to qspinlock in kernel):
struct pthread_spinlock_t
{
   union {
      struct {
         u8 locked; // lock byte
         u8 reserve; 
         u16 cpuid; // CPU id used by the last spinner, and using per-cpu infrastructure to convert it
         a pointer which points to the tail of queue. E.g per_cpu_var(qnode, cpuid)
      }
   int lock;
   }
}

PER-CPU struct qnode {
    struct qnode *next; // point to next spinner
    int flag;  // local spinning flag
}

But they are two problems here.
a) Lack of per-cpu infrastructure support in Glibc, so we can't do this cpuid->per-cpu-variable transition
b) Can't disable preemption at userland. 
   When a new spinner is adding to the queue, we need update the cpuid of pthread_spinlock_t to a new one.
   Pseudo-code:
	newid = get_current_cpuid();  
        prev = atomic_exchange_acquire(&cpuid, newid); // update cpuid to the new cpuid, and return
							 back the previous one
        tail_node = per_cpu_var(qnode, prev);  //get the last tail node of queue

There is a problem when preemption happens at a time window between get_current_cpuid() and atomic_exchange_acquire().
When the thread is rescheduled back, it maybe on another cpu with different cpuid.

===============================CUT HERE==================================
Another way is to store thread-specific info(e.g. tid) in pthread_spinlock_t instead of cpuid, then, we can avoid the
issue b), but it seems that we break the semantic of TLS? Comments?

References:
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: 马凌(彦军)
- 转发：[PATCH] NUMA spinlock [BZ #23962]
  - From: 马凌(彦军)
- Re: 转发：[PATCH] NUMA spinlock [BZ #23962]
  - From: Szabolcs Nagy
- Re: 转发：[PATCH] NUMA spinlock [BZ #23962]
  - From: H.J. Lu
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Carlos O'Donell
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: H.J. Lu
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Florian Weimer
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: H.J. Lu
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Carlos O'Donell
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Florian Weimer
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Carlos O'Donell
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Szabolcs Nagy
- Re: [PATCH] NUMA spinlock [BZ #23962]
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]