Glibc pthread_rwlock_timed*() Optimization Bug
Justin Chen
justinpopo6@gmail.com
Tue Sep 7 21:31:58 GMT 2021
Hello all,
We are seeing a faulty compiler optimization with
pthread_rwlock_timed*() when cross-compiling glibc for arm. This is
leading to deadlocks with some of our software.
We are using glibc-2.27, but I see the same issue with the current
master branch. This is built with the master branch using the
following configure.
../configure --host=arm-unknown-linux-gnueabihf
--prefix=/local/users/jc957059/source/glibc/test_lib/lib-built/
make && make install
The code in question is in nptl/pthread_rwlock_common.c line 490-501
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
| PTHREAD_RWLOCK_FUTEX_USED)
== (1 | PTHREAD_RWLOCK_FUTEX_USED))
{
/* TODO Back-off? */
}
ready = true;
break;
The compiled ASM is the following
5dc98: f043 0302 orr.w r3, r3, #2
atomic_thread_fence_acquire ();
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
5dc9c: 2b03 cmp r3, #3
5dc9e: d184 bne.n 5dbaa
<__pthread_rwlock_timedrdlock64+0x6e>
5dca0: e7fe b.n 5dca0
<__pthread_rwlock_timedrdlock64+0x164>
We only read __wrphase_futex once then hit an infinite loop.
Adding volatile seems to do the trick.
diff --git a/sysdeps/nptl/bits/struct_rwlock.h
b/sysdeps/nptl/bits/struct_rwlock.h
index 2f8b7ac..cd47bd2 100644
--- a/sysdeps/nptl/bits/struct_rwlock.h
+++ b/sysdeps/nptl/bits/struct_rwlock.h
@@ -30,7 +30,7 @@ struct __pthread_rwlock_arch_t
{
unsigned int __readers;
unsigned int __writers;
- unsigned int __wrphase_futex;
+ volatile unsigned int __wrphase_futex;
unsigned int __writers_futex;
unsigned int __pad3;
unsigned int __pad4;
The compiled ASM with this change (and with a few declarations
corrected with the volatile type)
5d2ca: f043 0302 orr.w r3, r3, #2
atomic_thread_fence_acquire ();
/* We still need to wait for explicit hand-over, but we must
not use futex_wait anymore because we would just time out
in this case and thus make the spin-waiting we need
unnecessarily expensive. */
while ((atomic_load_relaxed (&rwlock->__data.__wrphase_futex)
5d2ce: 2b03 cmp r3, #3
5d2d0: d0fa beq.n 5d2c8
<__pthread_rwlock_clockrdlock64+0x168>
5d2d2: e783 b.n 5d1dc
<__pthread_rwlock_clockrdlock64+0x7c>
No longer have infinite loop here.
It seems like the compiler is incorrectly optimizing the loop because
it is not informed that the value of __wrphase_futex can be changed in
another context, which I believe should be done with the volatile
attribute.
Does this analysis look correct?
Thank You,
Justin
More information about the Libc-help
mailing list