This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] NUMA spinlock [BZ #23962]



On 2019/1/11 上午3:24, Carlos O'Donell wrote:
> On 1/10/19 12:52 PM, Szabolcs Nagy wrote:
>> On 10/01/2019 16:41, Carlos O'Donell wrote:
>>> On 1/10/19 11:32 AM, Florian Weimer wrote:
>>>> * Carlos O'Donell:
>>>>
>>>>> My opinion is that for the health and evolution of a NUMA-aware spinlock
>>>>> and MCS lock, that we should create a distinct project and library that
>>>>> should have those locks, and then work to put them into downstream
>>>>> distributions. This will support key users being able to use supported
>>>>> versions of those libraries, and give the needed feedback about the API
>>>>> and the performance. It may take 1-2 years to get that feedback and every
>>>>> piece of feedback will improve the final API/ABI we put into glibc or
>>>>> even into the next ISO C standard as pat of the C thread interface.
>>>>
>>>> I think it's something taht could land in tbb, for which many
>>>> distributions already have mechanisms to ship updated versions after a
>>>> release.
>>>
>>> Absolutely. That's a great idea.
>>>
>>
>> in principle the pthread_spin_lock api can use this algorithm
>> assuming we can keep the pthread_spinlock_t abi and keep the
>> POSIX semantics. (presumably users ran into issues with the
>> existing posix api.. or how did this come up in the first place?)
>  
> Correct, but meeting the ABI contract of the pthread_spinlck_t turns
> out to be hard, there isn't much space. I've spoken with Kemi Wang 
> (Intel) about this specific issue, and he has some ideas to share,
> but I'll leave it for him to describe.
> 

It may be possible because we can make better use of size of pthread_spinlock_t.

MCS lock is a well known method to reduce spinlock overhead by queuing spinner, the spinlock 
cache line is only contended between spinlock holder and a active spinner, other spinners are
spinning on local-accessible flag until the previous spinner pass mcs lock holder down.

Usually, a classical MCS implementation requires an extra pointer *mcs_lock* to track the tail of queue.
When a new spinner is adding into the queue, we first get the current tail of queue, and move the mcs_lock
pointer to point to this new spinner(a new tail of queue). 
If we can squeeze some space in pthread_spinlock_t to store this tail info, and update this tail info
when a new spinner is added into the queue, then the MCS algorithm can be reimplemented without breaking ABI.
That's possible because *lock* itself don't have to occupy 32 bits (8 bits or even one bit should be enough).

Then the pthread_spinlock_t structure may be like this(Similar to qspinlock in kernel):
struct pthread_spinlock_t
{
   union {
      struct {
         u8 locked; // lock byte
         u8 reserve; 
         u16 cpuid; // CPU id used by the last spinner, and using per-cpu infrastructure to convert it
         a pointer which points to the tail of queue. E.g per_cpu_var(qnode, cpuid)
      }
   int lock;
   }
}

PER-CPU struct qnode {
    struct qnode *next; // point to next spinner
    int flag;  // local spinning flag
}

But they are two problems here.
a) Lack of per-cpu infrastructure support in Glibc, so we can't do this cpuid->per-cpu-variable transition
b) Can't disable preemption at userland. 
   When a new spinner is adding to the queue, we need update the cpuid of pthread_spinlock_t to a new one.
   Pseudo-code:
	newid = get_current_cpuid();  
        prev = atomic_exchange_acquire(&cpuid, newid); // update cpuid to the new cpuid, and return
							 back the previous one
        tail_node = per_cpu_var(qnode, prev);  //get the last tail node of queue

There is a problem when preemption happens at a time window between get_current_cpuid() and atomic_exchange_acquire().
When the thread is rescheduled back, it maybe on another cpu with different cpuid.

===============================CUT HERE==================================
Another way is to store thread-specific info(e.g. tid) in pthread_spinlock_t instead of cpuid, then, we can avoid the
issue b), but it seems that we break the semantic of TLS? Comments?




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]