Summary: | robust timed lock may leave other waiters no chance to wake up | ||
---|---|---|---|
Product: | glibc | Reporter: | ma.jiang |
Component: | nptl | Assignee: | Not yet assigned to anyone <unassigned> |
Status: | UNCONFIRMED --- | ||
Severity: | normal | CC: | carlos, drepper.fsp |
Priority: | P2 | Flags: | fweimer:
security-
|
Version: | unspecified | ||
Target Milestone: | --- | ||
See Also: | https://sourceware.org/bugzilla/show_bug.cgi?id=14485 | ||
Host: | Target: | ||
Build: | Last reconfirmed: | ||
Attachments: | patch to fix the bug |
(In reply to ma.jiang from comment #0) > The core logic of this problem is that pthread_mutex_timedlock should set > the FUTEX_WAITER flag before check timeout. So, the fix to the bug is very > simple. We just need to move the code block which set the FUTEX_WAITER flag > forward. I have created a patch to fix(see attachment) , is it ok for trunk? When you say "latest glibc sources" do you mean that you tested on master? How recently? (In reply to Carlos O'Donell from comment #1) > When you say "latest glibc sources" do you mean that you tested on master? > How recently? The patch is for glibc-2.24. I checked the trunk code just now, the bug still exist. I have not write a testcase to reproduce the bug, it's quite hard(Although the bug logic is clear)... (In reply to ma.jiang from comment #2) > (In reply to Carlos O'Donell from comment #1) > > > When you say "latest glibc sources" do you mean that you tested on master? > > How recently? > The patch is for glibc-2.24. I checked the trunk code just now, the bug > still exist. I have not write a testcase to reproduce the bug, it's quite > hard(Although the bug logic is clear)... Thank you for the verification. I've been collecting a number of robust mutex fixes, and I think in the next development cycle we'll try to address the last remaining issues, including this one. (In reply to ma.jiang from comment #2) > (In reply to Carlos O'Donell from comment #1) > > > When you say "latest glibc sources" do you mean that you tested on master? > > How recently? > The patch is for glibc-2.24. I checked the trunk code just now, the bug > still exist. I have not write a testcase to reproduce the bug, it's quite > hard(Although the bug logic is clear)... I believe this is fixed in master(/trunk). Please have a look at what we do with assume_other_futex_waiters: If we ever potentially share FUTEX_WAITERS with another thread (C in your example), then after the futex operation we make sure to set this flag in any case. This happens either when acquiring the mutex (in which case we'll notice this during unlock and wake up C), or when not being able to acquire it, in which case there must be another owner that interfered and we delegate wake-up responsibilities to that other owner by setting FUTEX_WAITERS. (In reply to Torvald Riegel from comment #4) > (In reply to ma.jiang from comment #2) > > (In reply to Carlos O'Donell from comment #1) > > > > > When you say "latest glibc sources" do you mean that you tested on master? > > > How recently? > > The patch is for glibc-2.24. I checked the trunk code just now, the bug > > still exist. I have not write a testcase to reproduce the bug, it's quite > > hard(Although the bug logic is clear)... > > I believe this is fixed in master(/trunk). Please have a look at what we do > with assume_other_futex_waiters: If we ever potentially share FUTEX_WAITERS > with another thread (C in your example), then after the futex operation we > make sure to set this flag in any case. This happens either when acquiring > the mutex (in which case we'll notice this during unlock and wake up C), or > when not being able to acquire it, in which case there must be another owner > that interfered and we delegate wake-up responsibilities to that other owner > by setting FUTEX_WAITERS. Yes, assume_other_futex_waiters can have FUTEX_WAITERS(after B get waked up), But the there are still no FUTEX_WAITERS in mutex->__data.__lock. The core logic is that pthread_mutex_timedlock could return ETIMEDOUT just before adding FUTEX_WAITERS to mutex->__data.__lock. See attached code chunk. #if (!defined __ASSUME_FUTEX_CLOCK_REALTIME \ || !defined lll_futex_timed_wait_bitset) struct timeval tv; struct timespec rt; /* Get the current time. */ (void) __gettimeofday (&tv, NULL); /* Compute relative timeout. */ rt.tv_sec = abstime->tv_sec - tv.tv_sec; rt.tv_nsec = abstime->tv_nsec - tv.tv_usec * 1000; if (rt.tv_nsec < 0) { rt.tv_nsec += 1000000000; --rt.tv_sec; } /* Already timed out? */ if (rt.tv_sec < 0) return ETIMEDOUT; //->>> We have returned here, with FUTEX_WAITERS in mutex->__data.__lock. #endif /* We cannot acquire the mutex nor has its owner died. Thus, try to block using futexes. Set FUTEX_WAITERS if necessary so that other threads are aware that there are potentially threads blocked on the futex. Restart if oldval changed in the meantime. */ if ((oldval & FUTEX_WAITERS) == 0) { if (atomic_compare_and_exchange_bool_acq (&mutex->__data.__lock, oldval | FUTEX_WAITERS, oldval) != 0) { oldval = mutex->__data.__lock; continue; } oldval |= FUTEX_WAITERS; } sorry, made a stupid mistake... /* Already timed out? */ if (rt.tv_sec < 0) return ETIMEDOUT; //->>> We have returned here, W_I_T_H_O_U_T FUTEX_WAITERS in mutex->__data.__lock. |
Created attachment 9772 [details] patch to fix the bug Hi all, I found the bug in robust timed mutex still existed in the latest glibc source, when porting some local patches to the newest glibc release. Provided there were 3 threads A,B and C, all of which were trying to manipulate the same robust mutex. See the following illustration. A: B: C: pthread_mutex_lock(m); pthread_mutex_timedlock(m); pthread_mutex_lock(m); pthread_mutex_unlock(m); pthread_mutex_lock(m); pthread_mutex_unlock(m); First, A locked m. Then B and C try to lock m, and both get traped(wait in kernel). A unlock m, wake up B and clear the FUTEX_WAITER flag. After B wake up(and have not do anything yet), A preempt B and lock m again. At this time, A will not add FUTEX_WAITER flag to m as it does not know whether there are other waiters. B continue to run, it call gettimeofday and find the time is out. So, it just return ETIMEOUT without trying to set FUTEX_WAITER flag. A unlock m , and do not do the wake syscall as there is no FUTEX_WAITER. C get stuck in kernel forever. The core logic of this problem is that pthread_mutex_timedlock should set the FUTEX_WAITER flag before check timeout. So, the fix to the bug is very simple. We just need to move the code block which set the FUTEX_WAITER flag forward. I have created a patch to fix(see attachment) , is it ok for trunk?