This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: FAIL nptl/tst-robustpi4 [BZ 23183]
- From: Stefan Liebler <stli at linux dot ibm dot com>
- To: libc-alpha at sourceware dot org
- Cc: Florian Weimer <fweimer at redhat dot com>, "Carlos O'Donell" <carlos at redhat dot com>, Torvald Riegel <triegel at redhat dot com>
- Date: Fri, 29 Jun 2018 08:54:48 +0200
- Subject: Re: FAIL nptl/tst-robustpi4 [BZ 23183]
- References: <b71bec8d-ef71-1903-c7a8-fafa2fa744f5@linux.vnet.ibm.com> <ec3d5947-a1a6-a4ee-e8cd-0a47e6408cd6@redhat.com> <1485447752.16721.17.camel@redhat.com>
On 01/26/2017 05:22 PM, Torvald Riegel wrote:
On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
On 01/26/2017 10:29 AM, Stefan Liebler wrote:
It seems as a race between futex- and exit-syscall causes ESRCH
result from futex-syscall.
I'll have a closer look at this.
I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
well as with 4.6 on a LPAR (but less often).
I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
across a wide number of kernels, but never tst-robustpi4.
https://sourceware.org/bugzilla/show_bug.cgi?id=19004
The robustpi support is certainly not very robust as Torvald's
recent fixes show, and there still remains at least one design
flaw that can't be fixed.
e.g.
https://sourceware.org/bugzilla/show_bug.cgi?id=14485
The underlying problem for that bug does not affect PI+robust, just
robust, I think. Unless I forgot about something, PI+robust should
always use the kernel to unlock.
Hi,
in the meantime, Florian Weimer could also reproduce this issue and
opened the bugzilla Bug 23183 - tst-robustpi4 test failure
(https://sourceware.org/bugzilla/show_bug.cgi?id=23183).
I've also dig a bit deeper - see details in bugzilla - and was also able
to reproduce it on intel.
If the thread with locked mutex is executing the exit-syscall
while the main-thread is executing the futex-syscall,
then it could lead to this ESRCH return value of the futex-syscall which
triggers the assertion.
In this situation, the futex-syscall has already added the FUTEX_WAITERS
bit to the lock-value and is then calling attach_to_pi_owner().
The exit-syscall is now setting the lock-value to FUTEX_WAITERS |
FUTEX_OWNER_DIED and is proceeding.
attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is
testing if the owner is currently exiting. In those cases, ESRCH is
returned!
Back in glibc, this assertion is triggered:
/* ESRCH can happen only for non-robust PI mutexes where
the owner of the lock died. */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
The assertion/comment does not agree with the current behaviour of the
kernel. Any ideas?
Bye
Stefan