sem_post: How does the implementation work?

Wed Mar 18 02:16:00 GMT 2015

Just following up on the original problem in case it helps anyone else.

The sem_post/sem_wait hang which inspired my original post turned out to be due
to a kernel bug which can occur when multiple process-shared futexes are located
in the same huge page. This was fixed a couple of years ago by Zhang Yi in
kernel commit 13d60f4b6ab5b702dc8d2ee20999f98a93728aec, but since we're on an
EL6 system we don't have the fix. Avoiding the use of huge page memory was
enough to resolve the issue for us.

Cheers,

John

On 24/01/15 11:06, John Steele Scott wrote:
> Adhemerval,
> 
> Thank you for pointing out that commit and related bug. It does look like that
> commit makes my question moot for today's glibc.
> 
> For older glibc, looking again I think the answer to my question is that it
> doesn't matter if the __new_sem_post() races with __new_sem_wait() on nwaiters.
> In that case, the futex syscall in __new_sem_wait() will return EWOULDBLOCK,
> since __new_sem_post() has alredy incremented the value. So there's no need for
> __new_sem_post() to explicitly wake a thread in that case.
> 
> Thanks,
> 
> John
> 
> On 23/01/15 21:09, Adhemerval Zanella wrote:
>> The GLIBC semaphore operations has been rewritten recently to fix BZ#12674 [1],
>> which does not show a deadlock, but rather than a race conditional that lead
>> to invalid memory access.
>>
>> Now all architectures (with the exception of SPARC) uses the C implementation
>> at nptl [2].  Now back with your question, this new version do synchronize the token 
>> operation using just one atomic operation (either by isem->data for 64b or 
>> isem->value for 32b).
>>
>> So I would suggest you to check your program using an updated
>> GLIBC version (we are about to release 2.21 which contains this new implementation). 
>>
>>
>> [1] https://sourceware.org/bugzilla/show_bug.cgi?id=12674
>> [2] https://sourceware.org/git/?p=glibc.git;a=commit;h=042e1521c794a945edc43b5bfa7e69ad70420524
>>
>>
>> On 23-01-2015 03:58, John Steele Scott wrote:
>>> An attempt to debug a deadlock has led me and a colleague to look at the implementation of sem_wait()/sem_post().
>>>
>>> Looking at sem_post.c and sem_wait.c in nptl/sysdeps/unix/sysv/linux/x86_64, what stops a race between updating nwaiters in __new_sem_wait(), and querying nwaiters in __new_sem_post()?
>>>
>>> It looks like it's possible for __new_sem_post() to read a zero value for nwaiters just before __new_sem_wait() increments nwaiters (just after __new_sem_wait() failed to decrement value).
>>>
>>> Can someone explain what stops this from happening?
>>>
>>> Also, (I assume) we're actually running the hand-coded x86_64 assembly implementation, which looks to work the same way as the C. What stops the race there?
>>>
>>> I have a bit of an understanding about how atomic operations work, but it's non-obvious how they are actually preventing a deadlock on the contended path here. It looks like we really need value and nwaiters to be considered as a single atomic value, rather than two separate atomics. How are changes to these two made atomically?
>>>
>>> Cheers,
>>>
>>> John
>>>
>>
>>
> 
> 
>