This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] [BZ #21778] Fix oversight in robust mutex lock acquisition.
- From: Carlos O'Donell <carlos at redhat dot com>
- To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, Torvald Riegel <triegel at redhat dot com>, GLIBC Devel <libc-alpha at sourceware dot org>
- Cc: Carlos O'Donell <codonell at redhat dot com>
- Date: Fri, 28 Jul 2017 18:47:11 -0400
- Subject: Re: [PATCH] [BZ #21778] Fix oversight in robust mutex lock acquisition.
- Authentication-results: sourceware.org; auth=none
- References: <1501018343.30433.50.camel@redhat.com> <a11be21c-a90c-7e7a-0ee2-950cd9ca8aad@gotplt.org>
On 07/28/2017 08:41 AM, Siddhesh Poyarekar wrote:
> On Wednesday 26 July 2017 03:02 AM, Torvald Riegel wrote:
>> 65810f0ef05e8c9e333f17a44e77808b163ca298 fixed a robust mutex bug but
>> introduced BZ 21778: if the CAS used to try to acquire a lock fails, the
>> expected value is not updated, which breaks other cases in the lock
>> acquisition loop. The fix is to simply update the expected value with
>> the value returned by the CAS, which ensures that behavior is as if the
>> first case with the CAS never happened (if the CAS fails).
>>
>> This is a regression introduced in the last release, so it would be good
>> to get this included in this release. I'll likely be AFK on Thursday,
>> so please just commit this once it has been approved. Tested on
>> x86_64-linux.
>>
>>
>> [BZ 21778]
>> * nptl/pthread_mutex_timedlock.c (__pthread_mutex_timedlock): Update
>> oldval if the CAS fails.
>> * nptl/pthread_mutex_lock.c (__pthread_mutex_lock_full): Likewise.
>> * nptl/tst-mutex7.c (ROBUST, DELAY_NSEC, ROUNDS, N): New.
>> (tf, do_test): Use them.
>> * nptl/tst-mutex7robust.c: New file.
>> * nptl/Makefile (tests): Add new test.
>>
>
> Looks good to me, but I think Carlos is also reviewing this, so please
> wait for his confirmation before you commit.
So far I've finished confirming this fix on:
- x86_64
- i686
- ppc64
With the following targets still lagging:
- s390x
- aarch64
- ppc64le
- armv7hl
I'd like to see those 4 last targets complete their test cycle before
I commit this change.
I've got a v2 which basically just adds some more comments to the testsuite.
The bug is 100% reproducible and easily with the small critical section and
the lock/unlock sequence of 100 threads, we hit the CAS very quickly, fail
to set oldval and everything deadlocks. It's always easier to validate when
the failure is in a high contention path, because creating contention is easy :-)
Would you be opposed if I commit this slightly *after* the midnight UTC hard
freeze? That way I can confirm the other four arches are OK? I expect that with
the coverage I have here it won't break anyone else (and the patch is clearly
correct, but I would like to really double check).
Cheers,
Carlos.