23844 – pthread_rwlock_trywrlock results in hang

Bug 23844 - pthread_rwlock_trywrlock results in hang

Summary: pthread_rwlock_trywrlock results in hang

Status:	RESOLVED FIXED

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	nptl (show other bugs)
Version:	2.26

Importance:	P2 normal
Target Milestone:	2.30
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2018-10-30 13:40 UTC by Michael Marxmeier
Modified:	2019-11-27 16:59 UTC (History)
CC List:	8 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2018-11-01 00:00:00

Flags:	fweimer: security-

Attachments
Test case (697 bytes, text/x-csrc) 2018-10-30 13:40 UTC, Michael Marxmeier	Details
simpler trywrlock test case (703 bytes, text/x-csrc) 2018-12-11 14:02 UTC, Rik Prohaska	Details
trywrlock patch (268 bytes, patch) 2018-12-11 14:23 UTC, Rik Prohaska	Details \| Diff
test case for tryrdlock hangs (703 bytes, text/x-csrc) 2018-12-11 15:57 UTC, Rik Prohaska	Details
patch for tryrdlock function (418 bytes, patch) 2018-12-12 15:49 UTC, Rik Prohaska	Details \| Diff
View All Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Michael Marxmeier 2018-10-30 13:40:29 UTC

Created attachment 11373 [details]
Test case

After upgrading from glibc 2.23 to 2.26, we've been seeing what looks like
a hang inside pthread_rwlock calls in our application.

A pthread_rwlock_trywrlock as a "quick path" if a lock is available.  
Otherwise a pthread_rwlock_wrlock is used along with additional 
measurement information.

   if((ecode = pthread_rwlock_trywrlock(&rw_lock[idx]))) {
      if(ecode == EBUSY)
         ecode = pthread_rwlock_wrlock(&rw_lock[idx]);
      if(ecode) {
         perror("pthread_rwlock_wrlock");
         exit(1);
      }
   }

This should behave identically to pthread_rwlock_wrlock().
It does result in a hang with the lock not taken but all
threads blocked in pthread_rwlock calls.

The attached test case makes it easily reproducible.

The lock state looks like this:

 {__data = {__lock = 10, __nr_readers = 0, __readers_wakeup = 2, 
  __writer_wakeup = 3, __nr_readers_queued = 0, __nr_writers_queued = 0, 
  __writer = 0, __shared = 0, __rwelision = 0 '\000', 
  __pad1 = "\000\000\000\000\000\000", __pad2 = 0, __flags = 0}, 
  __size = "\n\000\000\000\000\000\000\000\002\000\000\000\003", '\000' <repeats 42 times>, __align = 10}

Threads are suspended in pthread_rwlock_wrlock or pthread_rwlock_rdlock

  2    Thread 0x7fb876c92700 (LWP 28072) "rwl_g" 0x00007fb87705e585 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
  3    Thread 0x7fb876491700 (LWP 28073) "rwl_g" 0x00007fb87705e12a in pthread_rwlock_rdlock () from /lib64/libpthread.so.0
  4    Thread 0x7fb875c90700 (LWP 28074) "rwl_g" 0x00007fb87705e585 in pthread_rwlock_wrlock () from /lib64/libpthread.so.0
* 5    Thread 0x7fb87548f700 (LWP 28075) "rwl_g" 0x00007fb87705e63f in pthread_rwlock_wrlock () from /lib64/libpthread.so.0

Comment 1 Carlos O'Donell 2018-11-01 01:54:52 UTC

Thank you for the bug report and reduced test case.

At first I thought this might be "rwlock: Fix explicit hand-over (bug 21298)", but we fixed that *in* 2.26, but I would like you to double check that you have that fix in your sources e.g. commit faf8c066df0d6bccb54bd74dd696eeb65e1b3bbc.

I looked at your test case and it seems entirely reasonable.

It's interesting that the effect of using trylock + EBUSY checking basically ensures that the locking and sleeping on the futex happen in a very narrow band of code.

Interestingly enough with strace it works, ptrace seems to perturb the test enough that it works. Likewise it works sometimes with gdb (also using ptrace), but maybe 1 out of 4 times it fails. So it certainly looks like a race.

With readers doing the same thing, trylock + EBUSY->lock, it succeeds. So perhaps it's only the write side that has a defect.

In the failure mode I see 3 threads in the write lock, and 1 thread in the read lock. I haven't seen anything different. In all 4 threads all the threads have entered the kernel via futex_wait and are stuck there waiting for some thread to wake them up, and that will never happen.

I don't immediately see what's wrong, but I'd have to audit __pthread_rwlock_wrlock_full and __pthread_rwlock_trywrlock before concluding what's wrong.

I've asked Torvald Riegel to have look, he's the primary author of the new rwlock implementation.

Comment 2 Michael Marxmeier 2018-11-02 13:23:59 UTC

Some additional notes

Unfortunately, the problem is not limited to this particular use of 
trywrlock. I can also reproduce the same issue for readers, it takes 
somewhat longer to reproduce but you end up in the same condition.
Also, a more concentional use of trywrlock (check availability of multiple
resources) results in the same hang.

As far as i can see this affects all glibc versions in recent distributions, 
eg. Fedora 28/29, SUSE 15 or Ubuntu 1810. glibc versions before 2.26 seem
not affected, eg. CentOS 7.5.

Comment 3 Rik Prohaska 2018-12-11 14:00:23 UTC

A simpler test case based on the original test case which uses only one lock (and only two threads also reproduces the thread hangs when using the trywrlock function. Since there are only two threads manipulating the same lock, it will be easier to identify the cause of the thread hangs.

Comment 4 Rik Prohaska 2018-12-11 14:02:14 UTC

Created attachment 11448 [details]
simpler trywrlock test case

Comment 5 Rik Prohaska 2018-12-11 14:15:52 UTC

The futex waits on the wrphase futex occur because the 'USED' flag is being discarded by the 'trywrlock' function.  Since the 'USED' flag is discarded, the 'wrunlock' function will not wake the futex when the rwlock is released.  This causes other threads waiting for the rwlock to hang.

The 'wrlock' function does the following on an uncontended lock that is already in the write phase.  (1) the WL bit is atomically set, (2) if the WL was not previously set then this thread holds the exclusive write lock (3) the writers futex is set to 1, and finally (4) the cur_writer is set.  Note that the wrphase futext is not changed in this 'wrlock' code path.

Unfortunately, the 'trywrlock' function ALSO sets the wrphase futex = 1.  This is racy with the contended lock code paths in the 'rdlock' and 'wrlock' functions, and can cause the 'USED' bit to be lost.

The patch just removes the code in 'trywrlock' that sets the wrphase futex to match the 'wrlock' fast path logic.  This patch passes the included test cases.

diff --git a/nptl/pthread_rwlock_trywrlock.c b/nptl/pthread_rwlock_trywrlock.c
index 5a73eba756..b423648ad6 100644
--- a/nptl/pthread_rwlock_trywrlock.c
+++ b/nptl/pthread_rwlock_trywrlock.c
@@ -47,7 +47,6 @@ __pthread_rwlock_trywrlock (pthread_rwlock_t *rwlock)
 	  r | PTHREAD_RWLOCK_WRPHASE | PTHREAD_RWLOCK_WRLOCKED))
 	{
 	  atomic_store_relaxed (&rwlock->__data.__writers_futex, 1);
-	  atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
 	  atomic_store_relaxed (&rwlock->__data.__cur_writer,
 	      THREAD_GETMEM (THREAD_SELF, tid));
 	  return 0;

I have also seen thread hangs when only using 'tryrdlock' (not 'trywrlock') in these test cases.  This indicates that there is also a problem in the 'tryrdlock' function.

Comment 6 Rik Prohaska 2018-12-11 14:23:38 UTC

Created attachment 11449 [details]
trywrlock patch

Comment 7 Torvald Riegel 2018-12-11 15:28:29 UTC

(In reply to richard prohaska from comment #5)
> The futex waits on the wrphase futex occur because the 'USED' flag is being
> discarded by the 'trywrlock' function.  Since the 'USED' flag is discarded,
> the 'wrunlock' function will not wake the futex when the rwlock is released.
> This causes other threads waiting for the rwlock to hang.

I agree with your analysis of what the root cause is, but the fix you propose isn't correct.

trywrlock needs to install a write-phase, but only in some cases.  Essentially, this is:
-	  atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
+	  /* If we started a write phase, we need to enable readers to
+	     wait.  If we did not, we must not change it because other threads
+	     may have set the PTHREAD_RWLOCK_FUTEX_USED in the meantime.  */
+	  if ((r & PTHREAD_RWLOCK_WRPHASE) == 0)
+	    atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);

I haven't looked at your tryrdlock case yet.

Comment 8 Rik Prohaska 2018-12-11 15:57:11 UTC

Created attachment 11450 [details]
test case for tryrdlock hangs

Comment 9 Rik Prohaska 2018-12-11 16:01:40 UTC

On my machine, bug23488.rd.c hangs after 1490 invocations.  Both threads waiting on wrphase futex.

Thread 3 (Thread 0x7f28d40ff700 (LWP 11018)):
#0  futex_abstimed_wait (private=0, abstime=0x0, expected=2, futex_word=0x55b0dfcdc068 <rw_lock+8>) at ../sysdeps/unix/sysv/linux/futex-internal.h:172
#1  __pthread_rwlock_wrlock_full (abstime=0x0, rwlock=0x55b0dfcdc060 <rw_lock>) at pthread_rwlock_common.c:803
#2  __GI___pthread_rwlock_wrlock (rwlock=0x55b0dfcdc060 <rw_lock>) at pthread_rwlock_wrlock.c:27
#3  0x000055b0dfcd9282 in runner (arg=0x0) at bug23844.rd.c:42
#4  0x00007f28d4b0df50 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007f28d4a1edef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 2 (Thread 0x7f28d4900700 (LWP 11017)):
#0  futex_abstimed_wait (private=0, abstime=0x0, expected=3, futex_word=0x55b0dfcdc068 <rw_lock+8>) at ../sysdeps/unix/sysv/linux/futex-internal.h:172
#1  __pthread_rwlock_rdlock_full (abstime=0x0, rwlock=0x55b0dfcdc060 <rw_lock>) at pthread_rwlock_common.c:446
#2  __GI___pthread_rwlock_rdlock (rwlock=0x55b0dfcdc060 <rw_lock>) at pthread_rwlock_rdlock.c:27
#3  0x000055b0dfcd9300 in runner (arg=0x0) at bug23844.rd.c:52
#4  0x00007f28d4b0df50 in start_thread (arg=<optimized out>) at pthread_create.c:486
#5  0x00007f28d4a1edef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f28d4901740 (LWP 11016)):
#0  0x00007f28d4b0f37d in __GI___pthread_timedjoin_ex (threadid=139813341628160, thread_return=0x0, abstime=0x0, block=<optimized out>) at pthread_join_common.c:89
#1  0x000055b0dfcd952b in main (argc=1, argv=0x7ffe734366e8) at bug23844.rd.c:112

p *rwlock
$3 = {__data = {__readers = 10, __writers = 0, __wrphase_futex = 2, __writers_futex = 1, __pad3 = 0, __pad4 = 0, __cur_writer = 0

Comment 10 Rik Prohaska 2018-12-11 18:26:15 UTC

(In reply to Torvald Riegel from comment #7)
> (In reply to richard prohaska from comment #5)
> > The futex waits on the wrphase futex occur because the 'USED' flag is being
> > discarded by the 'trywrlock' function.  Since the 'USED' flag is discarded,
> > the 'wrunlock' function will not wake the futex when the rwlock is released.
> > This causes other threads waiting for the rwlock to hang.
> 
> I agree with your analysis of what the root cause is, but the fix you
> propose isn't correct.
> 
> trywrlock needs to install a write-phase, but only in some cases. 
> Essentially, this is:
> -	  atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
> +	  /* If we started a write phase, we need to enable readers to
> +	     wait.  If we did not, we must not change it because other threads
> +	     may have set the PTHREAD_RWLOCK_FUTEX_USED in the meantime.  */
> +	  if ((r & PTHREAD_RWLOCK_WRPHASE) == 0)
> +	    atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
> 
> I haven't looked at your tryrdlock case yet.

I agree with this patch since it matches the wrlock logic that changes the read phase to write phase.

Comment 11 Rik Prohaska 2018-12-11 21:44:37 UTC

Lets compare the logic in tryrdlock and rdlock WRT the transition from write phase to read phase.  In tryrdlock, the wrphase futex is just set to 0.  In rtdlock, the wrphase futext is exchanged with 0, and if the USED bit was set then the futex is awakened.  I wonder if this difference is responsible for the subsequent hangs.

Comment 12 Rik Prohaska 2018-12-12 15:49:09 UTC

Created attachment 11457 [details]
patch for tryrdlock function

added logic to tryrdlock that exists in rdlock to wake the wrphase futex when the  rwlock phase transistions from write phase to read phase.  this logic wakes up other threads stalled in the rdlock function. 

this patch can successfully run the tryrdlock test cases.

Comment 13 Torvald Riegel 2018-12-12 21:43:37 UTC

(In reply to richard prohaska from comment #12)
> Created attachment 11457 [details]
> patch for tryrdlock function
> 
> added logic to tryrdlock that exists in rdlock to wake the wrphase futex
> when the  rwlock phase transistions from write phase to read phase.  this
> logic wakes up other threads stalled in the rdlock function. 
> 
> this patch can successfully run the tryrdlock test cases.

This makes sense.  Thank you.  The trywrlock fix and this one (with changed comments) will get posted upstream.

Comment 14 Carlos O'Donell 2019-01-21 20:21:36 UTC

The following is my analysis of the tryrdlock side of the failure first, followed by the tryrwlock side of the failure second. The patches as written by Torvald look good to me and cover the failure modes I can see in the current trylock implementation. If you spot any failure in my analysis please tell me and I'll see if anything further needs correction. There is a lot to consider, particularly in the tryrwlock side of the failure the readers spin waiting for the writer to finish setting both the PTHREAD_RWLOCK_WRPHASE and the __wrphase_futex, both must be set before readers will sleep on the futex.
---
A read lock begins to execute.

In __pthread_rwlock_rdlock_full:

We can attempt a read lock, but find that the lock is
in a write phase (PTHREAD_RWLOCK_WRPHASE, or WP-bit
is set), and the lock is held by a primary writer
(PTHREAD_RWLOCK_WRLOCKED is set). In this case we must
wait for explicit hand over from the writer to us or
one of the other waiters. The read lock threads are
about to execute:

341   r = (atomic_fetch_add_acquire (&rwlock->__data.__readers,
342                                  (1 << PTHREAD_RWLOCK_READER_SHIFT))
343        + (1 << PTHREAD_RWLOCK_READER_SHIFT));

An unlock beings to execute.

Then in __pthread_rwlock_wrunlock:

544   /* We have done everything we needed to do to prefer writers, so now we
545      either hand over explicitly to readers if there are any, or we simply
546      stay in a write phase.  See pthread_rwlock_rdunlock for more details.  */
547   unsigned int r = atomic_load_relaxed (&rwlock->__data.__readers);
548   /* Release MO so that subsequent readers or writers synchronize with us.  */
549   while (!atomic_compare_exchange_weak_release
550          (&rwlock->__data.__readers, &r,
551           ((r ^ PTHREAD_RWLOCK_WRLOCKED)
552            ^ ((r >> PTHREAD_RWLOCK_READER_SHIFT) == 0 ? 0
553               : PTHREAD_RWLOCK_WRPHASE))))
554     {
555       /* TODO Back-off.  */
556     }

We clear PTHREAD_RWLOCK_WRLOCKED, and if there are
no readers so we leave the lock in PTHRAD_RWLOCK_WRPHASE.

Back in the read lock.

The read lock adjusts __readres as above.

383   while ((r & PTHREAD_RWLOCK_WRPHASE) != 0
384          && (r & PTHREAD_RWLOCK_WRLOCKED) == 0)
385     {
386       /* Try to enter a read phase: If the CAS below succeeds, we have
387          ownership; if it fails, we will simply retry and reassess the
388          situation.
389          Acquire MO so we synchronize with prior writers.  */
390       if (atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers, &r,
391                                                 r ^ PTHREAD_RWLOCK_WRPHASE))
392         {

And then attemps to start the read phase.

Assume there happens to be a tryrdlock at this point, noting
that PTHREAD_RWLOCK_WRLOCKED is clear, and PTHREAD_RWLOCK_WRPHASE
is 1. So the try lock attemps to start the read phase.

In __pthread_rwlock_tryrdlock:

 44       if ((r & PTHREAD_RWLOCK_WRPHASE) == 0)
 45         {
 46           /* If we are in a read phase, try to acquire unless there is a
 47              primary writer and we prefer writers and there will be no
 48              recursive read locks.  */
 49           if (((r & PTHREAD_RWLOCK_WRLOCKED) != 0)
 50               && (rwlock->__data.__flags
 51                   == PTHREAD_RWLOCK_PREFER_WRITER_NONRECURSIVE_NP))
 52             return EBUSY;
 53           rnew = r + (1 << PTHREAD_RWLOCK_READER_SHIFT);
 54         }
...
 89   while (!atomic_compare_exchange_weak_acquire (&rwlock->__data.__readers,
 90       &r, rnew));

And succeeds.

Back in the write unlock:

557   if ((r >> PTHREAD_RWLOCK_READER_SHIFT) != 0)
558     { 
559       /* We must hand over explicitly through __wrphase_futex.  Relaxed MO is
560          sufficient because it is just used to delay acquisition by a writer;
561          any other synchronizes-with relations that are necessary are
562          established through __readers.  */
563       if ((atomic_exchange_relaxed (&rwlock->__data.__wrphase_futex, 0)
564            & PTHREAD_RWLOCK_FUTEX_USED) != 0)
565         futex_wake (&rwlock->__data.__wrphase_futex, INT_MAX, private);
566     }

We note that PTHREAD_RWLOCK_FUTEX_USED is non-zero
and don't wake anyone. This is OK because we handed
over to the trylock. It will be the trylock's responsibility
to wake any waiters.

Back in the read lock:

The read lock fails to install PTHRAD_REWLOCK_WRPHASE as 0 because
the __readers value was adjusted by the trylock, and so it falls through
to waiting on the lock for explicit handover from either a new writer
or a new reader.

448           int err = futex_abstimed_wait (&rwlock->__data.__wrphase_futex,
449                                          1 | PTHREAD_RWLOCK_FUTEX_USED,
450                                          abstime, private);

We use PTHREAD_RWLOCK_FUTEX_USED to indicate the futex
is in use.

At this point we have readers waiting on the read lock
to unlock. The wrlock is done. The trylock is finishing
the installation of the read phase.

 92   if ((r & PTHREAD_RWLOCK_WRPHASE) != 0)
 93     { 
 94       /* Same as in __pthread_rwlock_rdlock_full:
 95          We started the read phase, so we are also responsible for
 96          updating the write-phase futex.  Relaxed MO is sufficient.
 97          Note that there can be no other reader that we have to wake
 98          because all other readers will see the read phase started by us
 99          (or they will try to start it themselves); if a writer started
100          the read phase, we cannot have started it.  Furthermore, we
101          cannot discard a PTHREAD_RWLOCK_FUTEX_USED flag because we will
102          overwrite the value set by the most recent writer (or the readers
103          before it in case of explicit hand-over) and we know that there
104          are no waiting readers.  */
105       atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 0);
106     }

The trylock does note that we were the one that
installed the read phase, but the comments are not
correct, the execution ordering above shows that
readers might indeed be waiting, and they are.

The atomic_store_relaxed throws away PTHREAD_RWLOCK_FUTEX_USED,
and the waiting reader is never worken becuase as noted
above it is conditional on the futex being used.

The solution is for the trylock thread to inspect
PTHREAD_RWLOCK_FUTEX_USED and wake the waiting readers.

---

A write lock begins to execute, takes the write lock,
and then releases the lock...

In pthread_rwlock_wrunlock():

544   /* We have done everything we needed to do to prefer writers, so now we
545      either hand over explicitly to readers if there are any, or we simply
546      stay in a write phase.  See pthread_rwlock_rdunlock for more details.  */
547   unsigned int r = atomic_load_relaxed (&rwlock->__data.__readers);
548   /* Release MO so that subsequent readers or writers synchronize with us.  */
549   while (!atomic_compare_exchange_weak_release
550          (&rwlock->__data.__readers, &r,
551           ((r ^ PTHREAD_RWLOCK_WRLOCKED)
552            ^ ((r >> PTHREAD_RWLOCK_READER_SHIFT) == 0 ? 0
553               : PTHREAD_RWLOCK_WRPHASE))))
554     { 
555       /* TODO Back-off.  */
556     }

... leaving it in the write phase with zero readers
(the case where we leave the write phase in place 
during a write unlock).

A write trylock begins to execute.

In __pthread_rwlock_trywrlock:

 40   while (((r & PTHREAD_RWLOCK_WRLOCKED) == 0)
 41       && (((r >> PTHREAD_RWLOCK_READER_SHIFT) == 0)
 42           || (prefer_writer && ((r & PTHREAD_RWLOCK_WRPHASE) != 0))))
 43     {

The lock is not locked.

There are no readers.

 44       /* Try to transition to states #7 or #8 (i.e., acquire the lock).  */
 45       if (atomic_compare_exchange_weak_acquire (
 46           &rwlock->__data.__readers, &r,
 47           r | PTHREAD_RWLOCK_WRPHASE | PTHREAD_RWLOCK_WRLOCKED))

We atomically install the write phase and we take the
exclusive write lock.

 48         { 
 49           atomic_store_relaxed (&rwlock->__data.__writers_futex, 1);

We get this far.

A reader lock begins to execute.

In pthread_rwlock_rdlock:

437   for (;;)
438     {
439       while (((wpf = atomic_load_relaxed (&rwlock->__data.__wrphase_futex))
440               | PTHREAD_RWLOCK_FUTEX_USED) == (1 | PTHREAD_RWLOCK_FUTEX_USED))
441         {
442           int private = __pthread_rwlock_get_private (rwlock);
443           if (((wpf & PTHREAD_RWLOCK_FUTEX_USED) == 0)
444               && (!atomic_compare_exchange_weak_relaxed
445                   (&rwlock->__data.__wrphase_futex,
446                    &wpf, wpf | PTHREAD_RWLOCK_FUTEX_USED)))
447             continue;
448           int err = futex_abstimed_wait (&rwlock->__data.__wrphase_futex,
449                                          1 | PTHREAD_RWLOCK_FUTEX_USED,
450                                          abstime, private);

We are in a write phase, so the while() on line 439 is true.

The value of wpf does not have PTHREAD_RWLOCK_FUTEX_USED set
since this is the first reader to lock.

The atomic operation sets wpf with PTHREAD_RELOCK_FUTEX_USED
on the expectation that this reader will be woken during
the handoff.

Back in pthread_rwlock_trywrlock:

 50           atomic_store_relaxed (&rwlock->__data.__wrphase_futex, 1);
 51           atomic_store_relaxed (&rwlock->__data.__cur_writer,
 52               THREAD_GETMEM (THREAD_SELF, tid));
 53           return 0;
 54         } 
 55       /* TODO Back-off.  */
 56       /* See above.  */
 57     } 

We write 1 to __wrphase_futex discarding PTHREAD_RWLOCK_FUTEX_USED,
and so in the unlock we will not awaken the waiting reader.

The solution to this is to realize that if we did not start the write
phase we need not write 1 or any other value to __wrphase_futex.
This ensures that any readers (which saw __wrphase_futex != 0) can 
set PTHREAD_RWLOCK_FUTEX_USED and this can be used at unlock to
wake them.

If we installed the write phase then all other readers are looping
here:

In __pthread_rwlock_rdlock_full:

437   for (;;)                               
438     {                                    
439       while (((wpf = atomic_load_relaxed (&rwlock->__data.__wrphase_futex))
440               | PTHREAD_RWLOCK_FUTEX_USED) == (1 | PTHREAD_RWLOCK_FUTEX_USED))
441         {     
...
508     }

waiting for the write phase to be installed or removed before they
can begin waiting on __wrphase_futex (part of the algorithm), or
taking a concurrent read lock, and thus we can safely write 1 to
__wrphase_futex.

If we did not install the write phase then the readers may already
be waiting on the futex, the original writer wrote 1 to __wrphase_futex
as part of starting the write phase, and we cannot also write 1
without loosing the PTHREAD_RWLOCK_FUTEX_USED bit.

---

Comment 15 Carlos O'Donell 2019-01-22 03:44:38 UTC

OK, I have two tweaked test cases one for tryrdlock (longer timeout and some notes), and one for tryrwlock (shorter timeout). As noted the tryrdlock failure is harder to trigger, but developers will see it given the number of people in the community, so I lowered the test time to 20s and I see it fail with 5-10% on a 4 core 3.5GHz VM.

I'll cleanup the ChangeLog and post tomorrow morning for review.

Comment 16 Carlos O'Donell 2019-01-22 18:29:50 UTC

v2 patch posted with 2 regression tests:
https://www.sourceware.org/ml/libc-alpha/2019-01/msg00569.html

Comment 17 Sourceware Commits 2019-02-01 01:49:08 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4 (commit)
      from  932329a5134665bfad601a8709da2602d943fc38 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4

commit 5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Mon Jan 21 22:50:12 2019 -0500

    nptl: Fix pthread_rwlock_try*lock stalls (Bug 23844)
    
    For a full analysis of both the pthread_rwlock_tryrdlock() stall
    and the pthread_rwlock_trywrlock() stall see:
    https://sourceware.org/bugzilla/show_bug.cgi?id=23844#c14
    
    In the pthread_rwlock_trydlock() function we fail to inspect for
    PTHREAD_RWLOCK_FUTEX_USED in __wrphase_futex and wake the waiting
    readers.
    
    In the pthread_rwlock_trywrlock() function we write 1 to
    __wrphase_futex and loose the setting of the PTHREAD_RWLOCK_FUTEX_USED
    bit, again failing to wake waiting readers during unlock.
    
    The fix in the case of pthread_rwlock_trydlock() is to check for
    PTHREAD_RWLOCK_FUTEX_USED and wake the readers.
    
    The fix in the case of pthread_rwlock_trywrlock() is to only write
    1 to __wrphase_futex if we installed the write phase, since all other
    readers would be spinning waiting for this step.
    
    We add two new tests, one exercises the stall for
    pthread_rwlock_trywrlock() which is easy to exercise, and one exercises
    the stall for pthread_rwlock_trydlock() which is harder to exercise.
    
    The pthread_rwlock_trywrlock() test fails consistently without the fix,
    and passes after. The pthread_rwlock_tryrdlock() test fails roughly
    5-10% of the time without the fix, and passes all the time after.
    
    Signed-off-by: Carlos O'Donell <carlos@redhat.com>
    Signed-off-by: Torvald Riegel <triegel@redhat.com>
    Signed-off-by: Rik Prohaska <prohaska7@gmail.com>
    Co-authored-by: Torvald Riegel <triegel@redhat.com>
    Co-authored-by: Rik Prohaska <prohaska7@gmail.com>

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                         |   17 ++
 nptl/Makefile                     |    3 +-
 nptl/pthread_rwlock_tryrdlock.c   |   25 ++-
 nptl/pthread_rwlock_trywrlock.c   |    9 +-
 nptl/tst-rwlock-tryrdlock-stall.c |  355 +++++++++++++++++++++++++++++++++++++
 nptl/tst-rwlock-trywrlock-stall.c |  108 +++++++++++
 support/Makefile                  |    1 +
 support/xpthread_rwlock_destroy.c |   26 +++
 support/xthread.h                 |    1 +
 9 files changed, 534 insertions(+), 11 deletions(-)
 create mode 100644 nptl/tst-rwlock-tryrdlock-stall.c
 create mode 100644 nptl/tst-rwlock-trywrlock-stall.c
 create mode 100644 support/xpthread_rwlock_destroy.c

Comment 18 Carlos O'Donell 2019-02-01 01:49:38 UTC

Fixed in 2.30.

Comment 19 Sourceware Commits 2019-02-01 03:06:05 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.29/master has been updated
       via  86013ef5cea322b8f4b9c22f230c22cce369e947 (commit)
      from  56c86f5dd516284558e106d04b92875d5b623b7a (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=86013ef5cea322b8f4b9c22f230c22cce369e947

commit 86013ef5cea322b8f4b9c22f230c22cce369e947
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Mon Jan 21 22:50:12 2019 -0500

    nptl: Fix pthread_rwlock_try*lock stalls (Bug 23844)
    
    For a full analysis of both the pthread_rwlock_tryrdlock() stall
    and the pthread_rwlock_trywrlock() stall see:
    https://sourceware.org/bugzilla/show_bug.cgi?id=23844#c14
    
    In the pthread_rwlock_trydlock() function we fail to inspect for
    PTHREAD_RWLOCK_FUTEX_USED in __wrphase_futex and wake the waiting
    readers.
    
    In the pthread_rwlock_trywrlock() function we write 1 to
    __wrphase_futex and loose the setting of the PTHREAD_RWLOCK_FUTEX_USED
    bit, again failing to wake waiting readers during unlock.
    
    The fix in the case of pthread_rwlock_trydlock() is to check for
    PTHREAD_RWLOCK_FUTEX_USED and wake the readers.
    
    The fix in the case of pthread_rwlock_trywrlock() is to only write
    1 to __wrphase_futex if we installed the write phase, since all other
    readers would be spinning waiting for this step.
    
    We add two new tests, one exercises the stall for
    pthread_rwlock_trywrlock() which is easy to exercise, and one exercises
    the stall for pthread_rwlock_trydlock() which is harder to exercise.
    
    The pthread_rwlock_trywrlock() test fails consistently without the fix,
    and passes after. The pthread_rwlock_tryrdlock() test fails roughly
    5-10% of the time without the fix, and passes all the time after.
    
    Signed-off-by: Carlos O'Donell <carlos@redhat.com>
    Signed-off-by: Torvald Riegel <triegel@redhat.com>
    Signed-off-by: Rik Prohaska <prohaska7@gmail.com>
    Co-authored-by: Torvald Riegel <triegel@redhat.com>
    Co-authored-by: Rik Prohaska <prohaska7@gmail.com>
    (cherry picked from commit 5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                         |   17 ++
 nptl/Makefile                     |    3 +-
 nptl/pthread_rwlock_tryrdlock.c   |   25 ++-
 nptl/pthread_rwlock_trywrlock.c   |    9 +-
 nptl/tst-rwlock-tryrdlock-stall.c |  355 +++++++++++++++++++++++++++++++++++++
 nptl/tst-rwlock-trywrlock-stall.c |  108 +++++++++++
 support/Makefile                  |    1 +
 support/xpthread_rwlock_destroy.c |   26 +++
 support/xthread.h                 |    1 +
 9 files changed, 534 insertions(+), 11 deletions(-)
 create mode 100644 nptl/tst-rwlock-tryrdlock-stall.c
 create mode 100644 nptl/tst-rwlock-trywrlock-stall.c
 create mode 100644 support/xpthread_rwlock_destroy.c

Comment 20 Sourceware Commits 2019-02-18 07:06:24 UTC

This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, release/2.28/master has been updated
       via  e8c13d5f7a6cd34bc2ac51a0c89fcbbfd2e5c043 (commit)
      from  60f80624257ef84eacfd9b400bda1b5a5e8e7816 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=e8c13d5f7a6cd34bc2ac51a0c89fcbbfd2e5c043

commit e8c13d5f7a6cd34bc2ac51a0c89fcbbfd2e5c043
Author: Carlos O'Donell <carlos@redhat.com>
Date:   Mon Jan 21 22:50:12 2019 -0500

    nptl: Fix pthread_rwlock_try*lock stalls (Bug 23844)
    
    For a full analysis of both the pthread_rwlock_tryrdlock() stall
    and the pthread_rwlock_trywrlock() stall see:
    https://sourceware.org/bugzilla/show_bug.cgi?id=23844#c14
    
    In the pthread_rwlock_trydlock() function we fail to inspect for
    PTHREAD_RWLOCK_FUTEX_USED in __wrphase_futex and wake the waiting
    readers.
    
    In the pthread_rwlock_trywrlock() function we write 1 to
    __wrphase_futex and loose the setting of the PTHREAD_RWLOCK_FUTEX_USED
    bit, again failing to wake waiting readers during unlock.
    
    The fix in the case of pthread_rwlock_trydlock() is to check for
    PTHREAD_RWLOCK_FUTEX_USED and wake the readers.
    
    The fix in the case of pthread_rwlock_trywrlock() is to only write
    1 to __wrphase_futex if we installed the write phase, since all other
    readers would be spinning waiting for this step.
    
    We add two new tests, one exercises the stall for
    pthread_rwlock_trywrlock() which is easy to exercise, and one exercises
    the stall for pthread_rwlock_trydlock() which is harder to exercise.
    
    The pthread_rwlock_trywrlock() test fails consistently without the fix,
    and passes after. The pthread_rwlock_tryrdlock() test fails roughly
    5-10% of the time without the fix, and passes all the time after.
    
    Signed-off-by: Carlos O'Donell <carlos@redhat.com>
    Signed-off-by: Torvald Riegel <triegel@redhat.com>
    Signed-off-by: Rik Prohaska <prohaska7@gmail.com>
    Co-authored-by: Torvald Riegel <triegel@redhat.com>
    Co-authored-by: Rik Prohaska <prohaska7@gmail.com>
    (cherry picked from commit 5fc9ed4c4058bfbdf51ad6e7aac7d209b580e8c4)

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                         |   17 ++
 NEWS                              |    1 +
 nptl/Makefile                     |    3 +-
 nptl/pthread_rwlock_tryrdlock.c   |   25 ++-
 nptl/pthread_rwlock_trywrlock.c   |    9 +-
 nptl/tst-rwlock-tryrdlock-stall.c |  355 +++++++++++++++++++++++++++++++++++++
 nptl/tst-rwlock-trywrlock-stall.c |  108 +++++++++++
 support/Makefile                  |    1 +
 support/xpthread_rwlock_destroy.c |   26 +++
 support/xthread.h                 |    1 +
 10 files changed, 535 insertions(+), 11 deletions(-)
 create mode 100644 nptl/tst-rwlock-tryrdlock-stall.c
 create mode 100644 nptl/tst-rwlock-trywrlock-stall.c
 create mode 100644 support/xpthread_rwlock_destroy.c

Comment 21 Orivej Desh 2019-09-03 11:34:03 UTC

We have diagnosed that this bug causes subversion server (svnserve) to hang under load [1] in Ubuntu 18.04, which is based on glibc 2.27. We are going to ask Ubuntu to package the bugfix. However, could you also release it in the release/2.27/master branch (if this is not against your maintenance policy — I see the latest commit there was on 2019-07-15) so that other distributions that ship glibc 2.27 would have a chance to release the fix before hitting the bug?

[1]
rwlock_rdlock: https://svn.apache.org/viewvc/subversion/tags/1.12.2/subversion/libsvn_subr/cache-membuffer.c?view=markup#l836
rwlock_trywrlock: https://svn.apache.org/viewvc/subversion/tags/1.12.2/subversion/libsvn_subr/cache-membuffer.c?view=markup#l866

Comment 22 Carlos O'Donell 2019-09-04 21:34:33 UTC

(In reply to Orivej Desh from comment #21)
> We have diagnosed that this bug causes subversion server (svnserve) to hang
> under load [1] in Ubuntu 18.04, which is based on glibc 2.27. We are going
> to ask Ubuntu to package the bugfix. However, could you also release it in
> the release/2.27/master branch (if this is not against your maintenance
> policy — I see the latest commit there was on 2019-07-15) so that other
> distributions that ship glibc 2.27 would have a chance to release the fix
> before hitting the bug?
> 
> [1]
> rwlock_rdlock:
> https://svn.apache.org/viewvc/subversion/tags/1.12.2/subversion/libsvn_subr/
> cache-membuffer.c?view=markup#l836
> rwlock_trywrlock:
> https://svn.apache.org/viewvc/subversion/tags/1.12.2/subversion/libsvn_subr/
> cache-membuffer.c?view=markup#l866

I would expect Adam Conrad (Ubuntu) and Aurelien Jarno (Debian) could work together to get this backported for you if you ask them nicely. Any fixes on master are OK for stable branch backports as long as they do not impact ABI/API and this is the case here, so the backport meets established consensus.

Comment 23 Zdenek Janda 2019-11-27 16:59:48 UTC

Hi,
we are seeing a bug with thread locking in openvswitch on Ubuntu 18.04, which i do suspect might be this bug. Its independent on Kernel or Openvswitch version, and seems specific to Ubuntu 18.04 while stable on Ubuntu 16.04, so it indeed points here to glibc. More information here: 
https://github.com/openvswitch/ovs-issues/issues/175
We have a good test case as the bug manifestates several times a day, so will now try to test the patch.