Bug 24924 - nptl/tst-rwlock9 and nptl/tst-rwlock18 fail on aarch64
Summary: nptl/tst-rwlock9 and nptl/tst-rwlock18 fail on aarch64
Status: RESOLVED MOVED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.30
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-08-21 11:01 UTC by Andreas Schwab
Modified: 2019-11-27 11:49 UTC (History)
5 users (show)

See Also:
Host: aarch64-*-*
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Andreas Schwab 2019-08-21 11:01:15 UTC
nptl/tst-rwlock9 and nptl/tst-rwlock18 never finish, with an infinite stream of "reader thread %d tries again".
Comment 1 Florian Weimer 2019-08-21 11:07:24 UTC
I don't think we have seen that in our testing.  Would you please share more details about the environment (kernel and silicon)?  Thanks.
Comment 2 Andreas Schwab 2019-08-21 11:56:37 UTC
It's using the standard Factory kernel, which is 5.2.8 right now.  The hardware is probably a Cavium ThunderX1.
Comment 3 Andreas Schwab 2019-08-21 12:00:33 UTC
It looks like the tests only fail when run on a ThunderX1.  Other build workers, even those with a ThunderX2, do not cause them to fail.
Comment 4 Andreas Schwab 2019-08-21 12:19:05 UTC
That appears to be a regression introduced between 2019-07-07 and 2019-07-14.
Comment 5 Adhemerval Zanella 2019-08-21 17:33:34 UTC
The test was recently changed to cover pthread_rwlock_clock*. Is this what is failing? Which kind of failure
Comment 6 Adhemerval Zanella 2019-08-21 17:35:35 UTC
(In reply to Adhemerval Zanella from comment #5)
> The test was recently changed to cover pthread_rwlock_clock*. Is this what
> is failing? Which kind of failure

I hit send prior to finishing the last sentence.  Which return value pthread_rwlock_* shows in the loop?
Comment 7 Andreas Schwab 2019-08-22 10:16:19 UTC
It's ETIMEDOUT of course, otherwise the loop would stop.  It's the CLOCK_REALTIME test that fails.
Comment 8 Andreas Schwab 2019-08-22 10:17:56 UTC
And the CLOCK_MONOTONIC test fails in the same way.
Comment 9 Andreas Schwab 2019-08-22 10:32:54 UTC
When I comment out the "reader thread tries again" messages (and skip the timedlock test), all I see is this:

writer thread 0 tries again
writer thread 1 tries again
writer thread 0 succeeded
writer thread 2 tries again
writer thread 3 tries again
writer thread 4 tries again
writer thread 5 tries again
writer thread 6 tries again
writer thread 7 tries again
writer thread 8 tries again
writer thread 9 tries again
writer thread 10 tries again
writer thread 11 tries again
writer thread 12 tries again
writer thread 13 tries again
writer thread 14 tries again
writer thread 0 released
writer thread 0 tries again
writer thread 2 tries again
writer thread 3 tries again
writer thread 4 tries again
writer thread 5 tries again
writer thread 6 tries again
writer thread 7 tries again
writer thread 8 tries again
writer thread 9 tries again
writer thread 10 tries again
writer thread 11 tries again
writer thread 12 tries again
writer thread 13 tries again
writer thread 14 tries again
Comment 10 Andreas Schwab 2019-11-25 17:01:20 UTC
This issue does not occur when pthread_rwlock_clockwrlock is compiled with gcc 8. With gcc 8, atomic_fetch_add_relaxed in pthread_rwlock_clockwrlock (pthread_rwlock_common.c:636) uses ldaxr, whereas gcc 9 uses ldxr.  Simarily, the expansion of atomic_compare_exchange_weak_acquire (pthread_rwlock_common.c:747) has stlxr vs stxr.
Comment 11 Andrew Pinski 2019-11-27 04:19:31 UTC
I see the failures also on Marvell OcteonTX2 with GCC 10 (20191008) but not with GCC 7.4 (Ubuntu's).  I don't know if it is a bug in the cores yet.  But I suspect out of order of stores might be the cause which means this might be a bug in the code.
Comment 12 Andrew Pinski 2019-11-27 07:22:20 UTC
(In reply to Andreas Schwab from comment #10)
> This issue does not occur when pthread_rwlock_clockwrlock is compiled with
> gcc 8. With gcc 8, atomic_fetch_add_relaxed in pthread_rwlock_clockwrlock
> (pthread_rwlock_common.c:636) uses ldaxr, whereas gcc 9 uses ldxr. 
> Simarily, the expansion of atomic_compare_exchange_weak_acquire
> (pthread_rwlock_common.c:747) has stlxr vs stxr.


I need to read through the source/comments to understand if the memory model usages are correct.  I really have a suspicion there is one that is messed up.
One thing that both OcteonTX2 and ThunderX1 (and OcteonTX1) do is they reorder stores more than most other aarch64 cores and if the barriers are not 100% correct, there is a write which could be much earlier on, you would run into an issue like this.  E.g. ThunderX1 (and OcteonTX1) have a 3k write buffer which means stores don't go out until the needed case.
Comment 13 Andrew Pinski 2019-11-27 10:37:02 UTC
This seems to fix the testcase:
diff --git a/nptl/pthread_rwlock_common.c b/nptl/pthread_rwlock_common.c
index 8db861fdcb..fccaccafd0 100644
--- a/nptl/pthread_rwlock_common.c
+++ b/nptl/pthread_rwlock_common.c
@@ -686,7 +686,7 @@ __pthread_rwlock_wrlock_full (pthread_rwlock_t *rwlock,
                     are not.  */
                  if (atomic_compare_exchange_weak_acquire
                      (&rwlock->__data.__writers,
-                      &w, (w - PTHREAD_RWLOCK_WRHANDOVER - 1)))
+                      &w, ((w & ~PTHREAD_RWLOCK_WRHANDOVER) - 1)))
                    {
                      /* Reload so our view is consistent with the view of
                         the previous owner of WRLOCKED.  See above.  */

------ CUT -----
But only on accident.

The problem is shown here:
 1a8:	885ffe08 	ldaxr	w8, [x16]
 1ac:	6b01011f 	cmp	w8, w1
 1b0:	540000a1 	b.ne	1c4 <pthread_rwlock_clockwrlock+0x1c4>  // b.any
 1b4:	f9000bf3 	str	x19, [sp, #16]
 1b8:	88137e05 	stxr	w19, w5, [x16]


The store to the stack there is broken and causes the stxr to always fail on ThunderX1 (and OcteonTX1) and OcteonTX2.
So this is a GCC bug and not a hardware bug.
Comment 14 Andrew Pinski 2019-11-27 11:11:39 UTC
I moved it to https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92692 with some analysis of the wrong code and where it goes wrong.