nptl/tst-rwlock9 and nptl/tst-rwlock18 never finish, with an infinite stream of "reader thread %d tries again".
I don't think we have seen that in our testing. Would you please share more details about the environment (kernel and silicon)? Thanks.
It's using the standard Factory kernel, which is 5.2.8 right now. The hardware is probably a Cavium ThunderX1.
It looks like the tests only fail when run on a ThunderX1. Other build workers, even those with a ThunderX2, do not cause them to fail.
That appears to be a regression introduced between 2019-07-07 and 2019-07-14.
The test was recently changed to cover pthread_rwlock_clock*. Is this what is failing? Which kind of failure
(In reply to Adhemerval Zanella from comment #5) > The test was recently changed to cover pthread_rwlock_clock*. Is this what > is failing? Which kind of failure I hit send prior to finishing the last sentence. Which return value pthread_rwlock_* shows in the loop?
It's ETIMEDOUT of course, otherwise the loop would stop. It's the CLOCK_REALTIME test that fails.
And the CLOCK_MONOTONIC test fails in the same way.
When I comment out the "reader thread tries again" messages (and skip the timedlock test), all I see is this: writer thread 0 tries again writer thread 1 tries again writer thread 0 succeeded writer thread 2 tries again writer thread 3 tries again writer thread 4 tries again writer thread 5 tries again writer thread 6 tries again writer thread 7 tries again writer thread 8 tries again writer thread 9 tries again writer thread 10 tries again writer thread 11 tries again writer thread 12 tries again writer thread 13 tries again writer thread 14 tries again writer thread 0 released writer thread 0 tries again writer thread 2 tries again writer thread 3 tries again writer thread 4 tries again writer thread 5 tries again writer thread 6 tries again writer thread 7 tries again writer thread 8 tries again writer thread 9 tries again writer thread 10 tries again writer thread 11 tries again writer thread 12 tries again writer thread 13 tries again writer thread 14 tries again
This issue does not occur when pthread_rwlock_clockwrlock is compiled with gcc 8. With gcc 8, atomic_fetch_add_relaxed in pthread_rwlock_clockwrlock (pthread_rwlock_common.c:636) uses ldaxr, whereas gcc 9 uses ldxr. Simarily, the expansion of atomic_compare_exchange_weak_acquire (pthread_rwlock_common.c:747) has stlxr vs stxr.
I see the failures also on Marvell OcteonTX2 with GCC 10 (20191008) but not with GCC 7.4 (Ubuntu's). I don't know if it is a bug in the cores yet. But I suspect out of order of stores might be the cause which means this might be a bug in the code.
(In reply to Andreas Schwab from comment #10) > This issue does not occur when pthread_rwlock_clockwrlock is compiled with > gcc 8. With gcc 8, atomic_fetch_add_relaxed in pthread_rwlock_clockwrlock > (pthread_rwlock_common.c:636) uses ldaxr, whereas gcc 9 uses ldxr. > Simarily, the expansion of atomic_compare_exchange_weak_acquire > (pthread_rwlock_common.c:747) has stlxr vs stxr. I need to read through the source/comments to understand if the memory model usages are correct. I really have a suspicion there is one that is messed up. One thing that both OcteonTX2 and ThunderX1 (and OcteonTX1) do is they reorder stores more than most other aarch64 cores and if the barriers are not 100% correct, there is a write which could be much earlier on, you would run into an issue like this. E.g. ThunderX1 (and OcteonTX1) have a 3k write buffer which means stores don't go out until the needed case.
This seems to fix the testcase: diff --git a/nptl/pthread_rwlock_common.c b/nptl/pthread_rwlock_common.c index 8db861fdcb..fccaccafd0 100644 --- a/nptl/pthread_rwlock_common.c +++ b/nptl/pthread_rwlock_common.c @@ -686,7 +686,7 @@ __pthread_rwlock_wrlock_full (pthread_rwlock_t *rwlock, are not. */ if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__writers, - &w, (w - PTHREAD_RWLOCK_WRHANDOVER - 1))) + &w, ((w & ~PTHREAD_RWLOCK_WRHANDOVER) - 1))) { /* Reload so our view is consistent with the view of the previous owner of WRLOCKED. See above. */ ------ CUT ----- But only on accident. The problem is shown here: 1a8: 885ffe08 ldaxr w8, [x16] 1ac: 6b01011f cmp w8, w1 1b0: 540000a1 b.ne 1c4 <pthread_rwlock_clockwrlock+0x1c4> // b.any 1b4: f9000bf3 str x19, [sp, #16] 1b8: 88137e05 stxr w19, w5, [x16] The store to the stack there is broken and causes the stxr to always fail on ThunderX1 (and OcteonTX1) and OcteonTX2. So this is a GCC bug and not a hardware bug.
I moved it to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692 with some analysis of the wrong code and where it goes wrong.