This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH][BZ 18743] PowerPC: Fix a race condition when eliding a lock
- From: "Tulio Magno Quites Machado Filho" <tuliom at linux dot vnet dot ibm dot com>
- To: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>
- Cc: libc-alpha at sourceware dot org
- Date: Fri, 31 Jul 2015 11:57:54 -0300
- Subject: Re: [PATCH][BZ 18743] PowerPC: Fix a race condition when eliding a lock
- Authentication-results: sourceware.org; auth=none
- References: <1438274936-26493-1-git-send-email-tuliom at linux dot vnet dot ibm dot com> <55BA703D dot 7010303 at linaro dot org> <874mkl3wtq dot fsf at totoro dot lan> <55BB76FA dot 5040703 at linaro dot org>
Adhemerval Zanella <firstname.lastname@example.org> writes:
> On 30-07-2015 23:06, Tulio Magno Quites Machado Filho wrote:
>> Using pthread_rwlock_rdlock as an example, is_lock_free would be a preprocessor
>> token with the following contents:
>> 128 if (ELIDE_LOCK (rwlock->__data.__rwelision,
>> 129 rwlock->__data.__lock == 0
>> 130 && rwlock->__data.__writer == 0
>> 131 && rwlock->__data.__nr_readers == 0))
>> And this is where the important memory access happens.
>> If one of these fields are changed by another thread after this point, but
>> before __builtin_tbegin, we're able to reproduce the first problem I mentioned
> My understanding is this fields will be changed only in the lock taken path. Let's
> say a thread t1 grab the lock (either by a transaction or by using a default lock)
> and modifies any 'rwlock->__data' fields after the ELIDE_LOCK check and before
> the transaction begin on a second thread t2. My understanding of the issue you
> are seeing is the transaction will initiate and since is_lock_free is 1 it won't
> abort and continue as the a lock taken in both threads.
I agree with you. But this won't reproduce the bug. You have to change the
order of events.
Let's say t1 is the first one to read rwlock->__data fields and they're all 0.
t1 will set is_lock_free to 0.
Then t2 modifies rwlock->__data before t1 starts the transaction (as in: get a
lock, instead of a transaction).
After that, t1 starts the transaction. Here comes the problem: the memory
barrier is created with rwlock->__data saying that someone took the lock,
but is_lock_free is set to 0. In this scenario, t1 will proceed with the
> However my understanding is the transaction won't be able to commit since it
> will have a conflict in the 'rwlock->__data.lock' or in any memory being access
> in the critical region.
Yes, but with the order of events I mentioned, when t1 calls
pthread_rwlock_unlock, it will run the following code:
38 if (ELIDE_UNLOCK (rwlock->__data.__writer == 0
39 && rwlock->__data.__nr_readers == 0))
40 return 0;
Where ELIDE_UNLOCK is:
89 static inline bool
90 __elide_unlock (int is_lock_free)
92 if (is_lock_free)
94 __builtin_tend (0);
95 return true;
97 return false;
100 # define ELIDE_UNLOCK(is_lock_free) \
101 __elide_unlock (is_lock_free)
Remember that t1 started the transaction even if rwlock->__data said someone
was holding the lock.
So now, in __elide_unlock, is_lock_free will be 1. __elide_unlock will
return false and t1 will unlock a lock acquired by t2.
> Are you saying that the transaction is not being aborted
> in the critical region?
No. I'm saying the memory access to rwlock->__data is critical and should
happen inside the transaction. Without this, we can't say the whole process
> If compiler is doing memory reordering around HTM builtin it is *clearly* a bug
> since transaction begin/end acts a full memory barrier (sync). You should also
> add a compiler barrier in pthread elision code as well.
Agreed. But I guess that's out of scope for this particular bug.
I could prepare a separate patch with these changes if you agree with that.