This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: FAIL nptl/tst-robustpi4 [BZ 23183]


On 01/26/2017 05:22 PM, Torvald Riegel wrote:
On Thu, 2017-01-26 at 11:12 -0500, Carlos O'Donell wrote:
On 01/26/2017 10:29 AM, Stefan Liebler wrote:
It seems as a race between futex- and exit-syscall causes ESRCH
result from futex-syscall.

I'll have a closer look at this.

I see those fails with Linux 4.8 / 4.9 running in a z/VM guest as
well as with 4.6 on a LPAR (but less often).

I've seen tst-robustpi7 and tst-robustpi8 failures on all hardware
across a wide number of kernels, but never tst-robustpi4.

https://sourceware.org/bugzilla/show_bug.cgi?id=19004

The robustpi support is certainly not very robust as Torvald's
recent fixes show, and there still remains at least one design
flaw that can't be fixed.

e.g.
https://sourceware.org/bugzilla/show_bug.cgi?id=14485

The underlying problem for that bug does not affect PI+robust, just
robust, I think.  Unless I forgot about something, PI+robust should
always use the kernel to unlock.





Hi,

in the meantime, Florian Weimer could also reproduce this issue and opened the bugzilla Bug 23183 - tst-robustpi4 test failure (https://sourceware.org/bugzilla/show_bug.cgi?id=23183).

I've also dig a bit deeper - see details in bugzilla - and was also able to reproduce it on intel.

If the thread with locked mutex is executing the exit-syscall
while the main-thread is executing the futex-syscall,
then it could lead to this ESRCH return value of the futex-syscall which triggers the assertion.

In this situation, the futex-syscall has already added the FUTEX_WAITERS bit to the lock-value and is then calling attach_to_pi_owner().

The exit-syscall is now setting the lock-value to FUTEX_WAITERS | FUTEX_OWNER_DIED and is proceeding.

attach_to_pi_owner() is now e.g. trying to get the owner-task and/or is testing if the owner is currently exiting. In those cases, ESRCH is returned!

Back in glibc, this assertion is triggered:
/* ESRCH can happen only for non-robust PI mutexes where
   the owner of the lock died.  */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);

The assertion/comment does not agree with the current behaviour of the kernel. Any ideas?

Bye
Stefan


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]