This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] nptl: Fix pthread_rwlock_try*lock stalls [BZ #23844]
- From: Florian Weimer <fweimer at redhat dot com>
- To: Carlos O'Donell <carlos at redhat dot com>
- Cc: Torvald Riegel <triegel at redhat dot com>, Rik Prohaska <prohaska7 at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>, Siddhesh Poyarekar <siddhesh at gotplt dot org>
- Date: Tue, 22 Jan 2019 13:10:54 +0100
- Subject: Re: [PATCH] nptl: Fix pthread_rwlock_try*lock stalls [BZ #23844]
- References: <c3d0b34e-b0b0-f4bc-4cc9-ea5484c38836@redhat.com>
* Carlos O'Donell:
> +/* For a full analysis see comment:
> + https://sourceware.org/bugzilla/show_bug.cgi?id=23844#c14
In the past, we had issues with bug trackers going away and such
references becoming stale.
> +
> + The stall is caused by pthread_rwlock_tryrdlock failing to check
> + that PTHREAD_RWLOCK_FUTEX_USED is set in the __wrphase_futex futex
> + and then waking the futex.
> +
> + The fix for bug 23844 ensures that waiters on __wrphase_futex are
> + correctly woken. Before the fix the test stalls as readers can
> + wait forever on __wrphase_futex. */
> +
> +#include <stdio.h>
> +#include <stdlib.h>
> +#include <unistd.h>
> +#include <pthread.h>
> +#include <support/xthread.h>
> +#include <errno.h>
> +
> +/* We need only one lock to reproduce the issue. We will need multiple
> + threads to get the exact case where we have a read, try, and unlock
> + all interleaving to produce the case where the readers are waiting
> + and the try fails to wake them. */
> +pthread_rwlock_t onelock;
> +
> +/* The number of threads is arbitrary but empirically chosen to have
> + enough threads that we see the condition where waiting readers are
> + not woken by a successful tryrdlock. */
> +#define NTHREADS 32
> +
> +_Atomic int do_exit;
> +
> +void *
> +run_loop (void *arg)
> +{
> + int i = 0, ret, rw;
> + while (!do_exit)
> + {
> + /* Arbitrarily choose if we are the writer or reader. Choose a
> + high enough ratio of readers to writers to make it likely
> + that readers block (and eventually are susceptable to
> + stalling). */
> + rw = i % 8;
> + /* If we are a writer, take the write lock, and then unlock.
> + If we are a reader, try the lock, then lock, then unlock. */
> + if (rw)
rw != 0 (or (i % 8) != 0). Note sure if the variable actually increases
clarity here.
> + /* Run for some amount of time. Empirically speaking exercising
> + the stall via pthread_rwlock_tryrdlock is much harder, and on
> + a 3.5GHz 4 core x86_64 VM system it takes somewhere around
> + 20-200s to stall, approaching 100% stall past 200s. We can't
> + wait that long for a regression test so we just test for 20s,
> + and expect the stall to happen with a 5-10% chance (enough for
> + developers to see). */
> + sleep (20);
Use nanosleep or usleep to avoid signals.
> + /* Then exit. */
> + printf ("INFO: Exiting...");
> + fflush (stdout);
Why no trailing \n? It will just result in garbled output if there is a
problem.
> + do_exit = 1;
> + /* If any readers stalled then we will timeout waiting for them. */
> + for (i = 0; i < NTHREADS; i++)
> + xpthread_join (tids[i]);
> + printf ("done.\n");
> + xpthread_rwlock_destroy (&onelock);
> + printf ("PASS: No pthread_rwlock_tryrdlock stalls detected.\n");
> + return 0;
> +}
> +
> +#define TIMEOUT (30)
Parentheses are unnecessary.
Most of these comments also apply to ther other test.
I have not reviewed the code.
Thanks,
Florian