When priority inheritance or priority protection is enabled, pthread_mutex_unlock() calls atomic_compare_and_exchange_bool_acq() rather than atomic_compare_and_exchange_bool_rel(). On mips (and possibly other architectures), this results in a memory barrier after the atomic operation instead of before it. This means that there is a window where another core can do pthread_mutex_lock() and read stale information. We have a testcase that reproduces the problem. Changing to atomic_compare_and_exchange_bool_rel() makes the problem go away.
I've checked in a patch.
I'm confused. The change you've submitted is exactly opposite to what I proposed. The new code now has release semantics on the lock, and acquire semantics on the unlock, which is backwards from my understanding of what they should be. The comment in the patch is "All commits should have happened before the mutex lock is taken. Therefore use the _rel variant of the cmpxchg atomic op." We're trying to ensure that writes which happen while holding the lock on one cpu are visible to another cpu which acquires the lock. This means that the barrier must come before the atomic operation which releases the lock, and after the atomic operation which acquires the lock. I think the correct comment should be "All commits should have happened before the mutex lock is freed, therefore use the _rel variant of the cmpxchg atomic op."
This is changed.
looks good, thanks.