Bug 23183 - tst-robustpi4 test failure
Summary: tst-robustpi4 test failure
Status: RESOLVED MOVED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.27
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-05-15 11:30 UTC by Florian Weimer
Modified: 2019-01-23 16:04 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Reduced testcase which runs without the glibc-testsuite (1.65 KB, text/x-csrc)
2018-06-15 12:55 UTC, Stefan Liebler
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2018-05-15 11:30:19 UTC
tst-robustpi4: ../nptl/pthread_mutex_lock.c:425: __pthread_mutex_lock_full: Assertion `INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust' failed.

glibc is lightly patched, based on 0cd4a5e87f6885a2f15fe8e7eb7378d010cdb606 off the release/2.27/master branch.

I believe the build uses effective flags of -march=zEC12 -mtune=z13 -fasynchronous-unwind-tables -fstack-clash-protection, plus --enable-stack-protector=strong --enable-tunables --enable-systemtap configure flags for glibc.

Seen with kernel 4.10.0-13.el7.s390x.  This could be a kernel bug.  I will try to request a kernel update to something that is actually supported.
Comment 1 Stefan Liebler 2018-06-15 12:55:15 UTC
Created attachment 11071 [details]
Reduced testcase which runs without the glibc-testsuite

I could reproduce this fail. I've used the source of glibc-upstream from today without special cflags.
The used kernels are 4.14/4.16/4.17.

Here are some observations:
The tst-robustpi4.c includes tst-robust1.c with:
#define ENABLE_PI 1 (Thus the mutex is a PTHREAD_MUTEX_ROBUST_NP with PTHREAD_PRIO_INHERIT protocol.)
#define NOT_CONSISTENT 1 (Thus pthread_mutex_init is called in each round)

For each round, the test creates the tf-thread which locks the mutex and is then canceled by the main-thread.
Then the main-thread locks the mutex via pthread_mutex_lock and expects EOWNERDEAD.
This call sometimes triggers the assertion.
If the round is odd, then the tf-thread is joined before this call to pthread_mutex_lock.

According to the generated coredumps, the assertion occurs only if round is even
=> Thus there was no pthread_join before the main-thread locks m1!

But the coredump contains only the main-thread but not the tf-thread.
=> At least at the time of coredump-generation, the tf-thread has already been exited.

The following assertion has triggered:
/* ESRCH can happen only for non-robust PI mutexes where
   the owner of the lock died.  */
assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
As "robust = 16",  the syscall returned ESRCH.
Is the comment "for non-robust PI mutexes" correct?

(gdb) p/x m1
$4 = {__data = {__lock = 0xc0000000 = FUTEX_OWNER_DIED | FUTEX_WAITERS, __count = 0x1, __owner = 0xebc3, __nusers = 0x1, __kind = 0xb0,
    __spins = 0x0, __elision = 0x0, __list = {__prev = 0x3ffadcff9f0, __next = 0x3ffadcff9f0}},


I've extracted the relevant calls and adjusted the test in order to never use pthread_join before the main-thread locks m1 and increased the number of rounds.
On s390x, I've seen the assertion more often on a zVM-guest instead of running directly on a lpar!
On x86_64, I've used a kvm-guest with Fedora 28. If I only run the test, I don't see this assertion.
But if I e.g. build glibc while running the testcase, this assertion is also triggered on x86_64!


I've used uprobes and setup the following uprobe events with a bash-script of mine (The add_sym_tp-function adds a new tracepoint in a specific shared-library - in this case libpthread.so):
# Mark new round when tf calls pthread_setcancelstate:
# Note: this is also called by assert!
#000000000000f908 <pthread_setcancelstate> clfi	%r2,1
#000000000000f90e <pthread_setcancelstate+0x6> jh	000000000000f992 <pthread_setcancelstate+0x8a>
#000000000000f912 <pthread_setcancelstate+0xa> stmg	%r10,%r15,80(%r15)
#000000000000f918 <pthread_setcancelstate+0x10> ear	%r4,%a0
#000000000000f91c <pthread_setcancelstate+0x14> sllg	%r4,%r4,32
#000000000000f922 <pthread_setcancelstate+0x1a> ear	%r4,%a1
#-> 000000000000f926 <pthread_setcancelstate+0x1e> l	%r2,264(%r4)
#r4=volatile struct pthread *self = THREAD_SELF;
add_sym_tp libpthread pthread_setcancelstate+0x1e "NEWROUNDTFtid=+208(%r4):x32"

# pthread_cancel() called by main-thread and I see the thread-id
#000000000000f7e8 <pthread_cancel> l	%r1,208(%r2)
#-> 000000000000f7ec <pthread_cancel+0x4> cijnh	%r1,0,000000000000f8b2 <pthread_cancel+0xca>
#  if (INVALID_TD_P (pd)) => r1=tid
add_sym_tp libpthread pthread_cancel+0x4 "TFtid=%r1:x32"

# call of syscall exit-thread (see nptl/pthread_create.c
# at the end of START_THREAD_DEFN() { ... __exit_thread (); }!
#-> 0000000000007c12 <start_thread+0x172> svc	1
# r2 is the first argument of the syscall.
# I just need something to mark this event with EXITTHREAD!
add_sym_tp libpthread start_thread+0x172 "EXITTHREAD=%r2:u32"

# __pthread_mutex_lock__full() after compare-and-swap
# and before/after the futex(FUTEX_LOCK_PI)-syscall which returns ESRCH!
#int newval = id;
## ifdef NO_INCR
#newval |= FUTEX_WAITERS;
## endif
#oldval = atomic_compare_and_exchange_val_acq (&mutex->__data.__lock,
#newval, 0);
add_sym_tp libpthread __pthread_mutex_lock_full+0x8e "CSOLDVAL=%r2:x32 NEWVAL=%r10:x32"
#0000000000009eda <__pthread_mutex_lock_full+0x8a> cs	%r2,%r10,0(%r11)
#-> 0000000000009ede <__pthread_mutex_lock_full+0x8e> cije	%r2,0,000000000000a002 <__pthread_mutex_lock_full+0x1b2>
#0000000000009ee4 <__pthread_mutex_lock_full+0x94> cije	%r9,0,000000000000a144 <__pthread_mutex_lock_full+0x2f4>
#0000000000009eea <__pthread_mutex_lock_full+0x9a> lghi	%r3,6
#0000000000009eee <__pthread_mutex_lock_full+0x9e> lgr	%r2,%r11
#0000000000009ef2 <__pthread_mutex_lock_full+0xa2> lghi	%r4,1
#0000000000009ef6 <__pthread_mutex_lock_full+0xa6> lghi	%r5,0
#-> 0000000000009efa <__pthread_mutex_lock_full+0xaa> svc	238
#-> 0000000000009efc <__pthread_mutex_lock_full+0xac> lhi	%r3,-4096
#0000000000009f00 <__pthread_mutex_lock_full+0xb0> clrjnh	%r2,%r3,0000000000009f12 <__pthread_mutex_lock_full+0xc2>
#if (oldval != 0)
#{
# /* The mutex is locked.  The kernel will now take care of
#    everything.  */
# int private = (robust
#    ? PTHREAD_ROBUST_MUTEX_PSHARED (mutex)
#    : PTHREAD_MUTEX_PSHARED (mutex));
#    INTERNAL_SYSCALL_DECL (__err);
#int e = INTERNAL_SYSCALL (futex, __err, 4, &mutex->__data.__lock,
#			      __lll_private_flag (FUTEX_LOCK_PI,
#						  private), 1, 0);
# .... assert (INTERNAL_SYSCALL_ERRNO (e, __err) != ESRCH || !robust);
add_sym_tp libpthread __pthread_mutex_lock_full+0xaa "SVCFUTEXUADDR=%r2:x64 OP=%r3:u32 MEMVAL=+0(%r11):x32 VAL=%r4:x32 TIMEOUT=%r5:u32"
add_sym_tp libpthread __pthread_mutex_lock_full+0xac "SVCFUTEXRET=%r2:x32 MEMVAL=+0(%r11):x32"



Here is an excerpt of the trace of the attached test-program
which never joins the thread before the main-thread locks the mutex.
Note the last round is the one with the assertion:
# tracer: nop
#
#                              _-----=> irqs-off
#                             / _----=> need-resched
#                            | / _---=> hardirq/softirq
#                            || / _--=> preempt-depth
#                            ||| /     delay
#           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
#              | |       |   ||||       |         |
           <...>-37926 [000] d... 358524.031410: libpthread_pthread_setcancelstate_0x1e: (0x3ff9908f926) NEWROUNDTFtid=0x9426
           <...>-37926 [000] d... 358524.031411: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x0 NEWVAL=0x9426
           <...>-37715 [004] d... 358524.031418: libpthread_pthread_cancel_0x4: (0x3ff9908f7ec) TFtid=0x9426
           <...>-37715 [004] d... 358524.031422: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x9426 NEWVAL=0x9353
           <...>-37715 [004] d... 358524.031423: libpthread___pthread_mutex_lock_full_0xaa: (0x3ff99089efa) SVCFUTEXUADDR=0x10030d0 OP=6 MEMVAL=0x9426 VAL=0x1 TIMEOUT=0
           <...>-37926 [000] d... 358524.031426: libpthread_start_thread_0x172: (0x3ff99087c12) EXITTHREAD=0
           <...>-37715 [004] d... 358524.031433: libpthread___pthread_mutex_lock_full_0xac: (0x3ff99089efc) SVCFUTEXRET=0x0 MEMVAL=0xc0009353

# ... all successfull rounds looks like the one before / after this comment.
# Note: the lock has the value: FUTEX_OWNER_DIED | FUTEX_WAITERS | <TID of main-thread>

           <...>-37927 [000] d... 358524.031443: libpthread_pthread_setcancelstate_0x1e: (0x3ff9908f926) NEWROUNDTFtid=0x9427
           <...>-37927 [000] d... 358524.031444: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x0 NEWVAL=0x9427
           <...>-37715 [004] d... 358524.031450: libpthread_pthread_cancel_0x4: (0x3ff9908f7ec) TFtid=0x9427
           <...>-37715 [004] d... 358524.031454: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x9427 NEWVAL=0x9353
           <...>-37715 [004] d... 358524.031455: libpthread___pthread_mutex_lock_full_0xaa: (0x3ff99089efa) SVCFUTEXUADDR=0x10030d0 OP=6 MEMVAL=0x9427 VAL=0x1 TIMEOUT=0
           <...>-37927 [000] d... 358524.031458: libpthread_start_thread_0x172: (0x3ff99087c12) EXITTHREAD=0
           <...>-37715 [004] d... 358524.031465: libpthread___pthread_mutex_lock_full_0xac: (0x3ff99089efc) SVCFUTEXRET=0x0 MEMVAL=0xc0009353

# In the failing round, sys_exit is already called before sys_futex.
# Note: the lock has the value: FUTEX_OWNER_DIED | FUTEX_WAITERS | 0
#       and sys_futex is returning ESRCH!
           <...>-37928 [000] d... 358524.031476: libpthread_pthread_setcancelstate_0x1e: (0x3ff9908f926) NEWROUNDTFtid=0x9428
           <...>-37928 [000] d... 358524.031478: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x0 NEWVAL=0x9428
           <...>-37715 [004] d... 358524.031484: libpthread_pthread_cancel_0x4: (0x3ff9908f7ec) TFtid=0x9428
           <...>-37928 [000] d... 358524.031491: libpthread_start_thread_0x172: (0x3ff99087c12) EXITTHREAD=0
           <...>-37715 [004] d... 358524.031494: libpthread___pthread_mutex_lock_full_0x8e: (0x3ff99089ede) CSOLDVAL=0x9428 NEWVAL=0x9353
           <...>-37715 [004] d... 358524.031495: libpthread___pthread_mutex_lock_full_0xaa: (0x3ff99089efa) SVCFUTEXUADDR=0x10030d0 OP=6 MEMVAL=0x9428 VAL=0x1 TIMEOUT=0
           <...>-37715 [004] d... 358524.031505: libpthread___pthread_mutex_lock_full_0xac: (0x3ff99089efc) SVCFUTEXRET=0xfffffffd MEMVAL=0xc0000000

# assert() is also calling pthread_setcancelstate:
           <...>-37715 [004] d... 358524.031509: libpthread_pthread_setcancelstate_0x1e: (0x3ff9908f926) NEWROUNDTFtid=0x9353



But it seems as ESRCH is a valid return-value of sys_futex while looking at the kernel-sources (but I am no kernel expert):

sys_futex (FUTEX_LOCK_PI):
<ksrc>/kernel/futex.c: futex_lock_pi()
-> futex_lock_pi_atomic()
{
...
	/*
	 * First waiter. Set the waiters bit before attaching ourself to
	 * the owner. If owner tries to unlock, it will be forced into
	 * the kernel and blocked on hb->lock.
	 */
	newval = uval | FUTEX_WAITERS;
	ret = lock_pi_update_atomic(uaddr, uval, newval);
	if (ret)
		return ret;

# FUTEX_WAITERS flag is set. Now we race with sys_exit().

	/*
	 * If the update of the user space value succeeded, we try to
	 * attach to the owner. If that fails, no harm done, we only
	 * set the FUTEX_WAITERS bit in the user space variable.
	 */
	return attach_to_pi_owner(uval, key, ps);
}

attach_to_pi_owner()
{
...
	p = futex_find_get_task(pid);
	if (!p)
		return -ESRCH;
# Here is one ESRCH.
...
	/*
	 * We need to look at the task state flags to figure out,
	 * whether the task is exiting. To protect against the do_exit
	 * change of the task flags, we do this protected by
	 * p->pi_lock:
	 */
	raw_spin_lock_irq(&p->pi_lock);
	if (unlikely(p->flags & PF_EXITING)) {
		/*
		 * The task is on the way out. When PF_EXITPIDONE is
		 * set, we know that the task has finished the
		 * cleanup:
		 */
		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
# Here is another one.

		raw_spin_unlock_irq(&p->pi_lock);
		put_task_struct(p);
		return ret;
	}
}

/*
 * Process a futex-list entry, check whether it's owned by the
 * dying task, and do notification if so:
 */
int handle_futex_death(u32 __user *uaddr, struct task_struct *curr, int pi)
{
	u32 uval, uninitialized_var(nval), mval;

retry:
	if (get_user(uval, uaddr))
		return -1;

	if ((uval & FUTEX_TID_MASK) == task_pid_vnr(curr)) {
		/*
		 * Ok, this dying thread is truly holding a futex
		 * of interest. Set the OWNER_DIED bit atomically
		 * via cmpxchg, and if the value had FUTEX_WAITERS
		 * set, wake up a waiter (if any). (We have to do a
		 * futex_wake() even if OWNER_DIED is already set -
		 * to handle the rare but possible case of recursive
		 * thread-death.) The rest of the cleanup is done in
		 * userspace.
		 */
		mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
# In the assertion case, I see exactly this pattern.
# Note: FUTEX_WAITERS is already set - see futex_lock_pi_atomic().
		/*
		 * We are not holding a lock here, but we want to have
		 * the pagefault_disable/enable() protection because
		 * we want to handle the fault gracefully. If the
		 * access fails we try to fault in the futex with R/W
		 * verification via get_user_pages. get_user() above
		 * does not guarantee R/W access. If that fails we
		 * give up and leave the futex locked.
		 */
		if (cmpxchg_futex_value_locked(&nval, uaddr, uval, mval)) {
			if (fault_in_user_writeable(uaddr))
				return -1;
			goto retry;
		}
		if (nval != uval)
			goto retry;

		/*
		 * Wake robust non-PI futexes here. The wakeup of
		 * PI futexes happens in exit_pi_state():
		 */
		if (!pi && (uval & FUTEX_WAITERS))
			futex_wake(uaddr, 1, 1, FUTEX_BITSET_MATCH_ANY);
	}
	return 0;
}




sys_exit():
<ksrc>/kernel/exit.c: do_exit()
{
	...

	exit_signals(tsk);  /* sets PF_EXITING */
	/*
	 * Ensure that all new tsk->pi_lock acquisitions must observe
	 * PF_EXITING. Serializes against futex.c:attach_to_pi_owner().
	 */
	smp_mb();

	...

	exit_mm();
# This calls exit_robust_list() - see futex.c - which walks along the userspace
# robust-list of the exited thread and calls handle_futex_death() - see futex.c.

	/*
	* We can do this unlocked here. The futex code uses this flag
	* just to verify whether the pi state cleanup has been done
	* or not. In the worst case it loops once more.
	*/
	tsk->flags |= PF_EXITPIDONE;
}



I think ESRCH is a valid case and glibc is not allowed to trigger the assertion?
Shall it also add the tid of the calling thread to the lock-value in this case?
Comment 2 Stefan Liebler 2018-06-19 10:55:28 UTC
I've also added some kprobes in order to trace the syscalls exit / futex in the assertion-case.
Assumption: The dying thread (tid=0xa3c) owns the mutex and thus the futex-value is 0xa3c.

The main thread is trying to lock the mutex and is using syscall futex, which adds the FUTEX_WAITERS bit (futex-value is 0x80000a3c) in:
futex_lock_pi_atomic()
{
...
	/*
	 * First waiter. Set the waiters bit before attaching ourself to
	 * the owner. If owner tries to unlock, it will be forced into
	 * the kernel and blocked on hb->lock.
	 */
	newval = uval | FUTEX_WAITERS;
	ret = lock_pi_update_atomic(uaddr, uval, newval);
}


Afterwards attach_to_pi_owner is called. If the exit syscall is processed at the same time, then the PF_EXITING flag is set for the exiting thread. Then it can happen that attach_to_pi_owner returns EAGAIN and the futex-syscall retries to lock the futex. In the meantime the exit-syscall is processing the mutex and handle_futex_death() sets the futex-value to 0xc0000000:
handle_futex_death(...)
{
...
		mval = (uval & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
		if (cmpxchg_futex_value_locked(&nval, uaddr, uval, mval)) ...
...
}


If the futex-syscall is now calling attach_to_pi_owner and ...
... the exiting thread has already exited, then ESRCH is returned due to:
	if (!pid)
		return -ESRCH;
	p = futex_find_get_task(pid);
	if (!p)
		return -ESRCH;

... the exiting thread is not yet finished, then ESRCH is returned due to:
	if (unlikely(p->flags & PF_EXITING)) {
		int ret = (p->flags & PF_EXITPIDONE) ? -ESRCH : -EAGAIN;
		...
		return ret;
	}

This ESRCH is returned by the futex-syscall.



If the futex-syscall happens a bit earlier than the exit-syscall (glibc does not trigger the assertion), then the futex-syscall sets the FUTEX_WAITERS bit and attach_to_pi_owner() is returning zero. While exit-syscall, handle_futex_death() is setting the futex-value to 0xc0000000. Afterwards, the futex-syscall is calling fixup_owner()/fixup_pi_state_owner() which is then setting futex-value to 0xc0000000|<tid-of-main-thread>. In this case, the futex-syscall does not return an error.
Comment 3 Carlos O'Donell 2018-07-05 19:28:08 UTC
(In reply to Stefan Liebler from comment #2)
> This ESRCH is returned by the futex-syscall.

This is OK for non robust, but for robust only one answer is valid and it's EOWNERDEAD. This is mandated by the robust attribute on the lock.

This looks like a kernel bug dealing with the interaction of a dying thread that has a robust + PI lock.

Thoughts?
Comment 4 Stefan Liebler 2018-07-10 07:42:02 UTC
(In reply to Carlos O'Donell from comment #3)
> (In reply to Stefan Liebler from comment #2)
> > This ESRCH is returned by the futex-syscall.
> 
> This is OK for non robust, but for robust only one answer is valid and it's
> EOWNERDEAD. This is mandated by the robust attribute on the lock.
Is this documented anywhere?
Or was it an arrangement between kernel/glibc while adding robust mutexes?
> 
> This looks like a kernel bug dealing with the interaction of a dying thread
> that has a robust + PI lock.
> 
> Thoughts?

I've found the following kernel commits which obviously introduces ESRCH in attach_to_pi_owner() and it seems as the kernel developers have knowledge about this assertion/indefinitely-loop in glibc:

-"PATCH] pi-futex: robust-futex exit"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=e3f2ddeac718c768fdac4b7fe69d465172f788a8)

-"futex: futex_find_get_task remove credentails check"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7a0ea09ad5352efce8fe79ed853150449903b9f5):
"futex_lock_pi_atomic simply returns if it gets ESRCH.  glibc code,
however, doesn't expect that robust lock returns with ESRCH because it
should get either success or owner died."

-"pi-futex: fix exit races and locking problems"
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=778e9a9c3e7193ea9f434f382947155ffb59c755):
"pi-futex: fix exit races and locking problems
...
   3. results in self-inflicted deadlock inside glibc.
       Sometimes futex_lock_pi returns -ESRCH, when it is not expected
       and glibc enters to for(;;) sleep() to simulate deadlock. This problem
       is quite obvious and I think the patch is right. Though it looks like
       each "if" in futex_lock_pi() got some stupid special case "else if". :-)
"
Comment 5 Carlos O'Donell 2018-07-10 13:47:07 UTC
(In reply to Stefan Liebler from comment #4)
> (In reply to Carlos O'Donell from comment #3)
> > (In reply to Stefan Liebler from comment #2)
> > > This ESRCH is returned by the futex-syscall.
> > 
> > This is OK for non robust, but for robust only one answer is valid and it's
> > EOWNERDEAD. This is mandated by the robust attribute on the lock.

> Is this documented anywhere?

The canonical documentation is in the linux man pages project. This is where Torvald Riegel, Thomas Gleixner, Michael Kerrisk, and others all put down the exact details of the allowed return codes, and this in turn allowed glibc to tighten the error codes accepted.

man man2/futex.2
~~~
ESRCH  (FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI, FUTEX_CMP_REQUEUE_PI) The  thread
              ID in the futex word at uaddr does not exist.
~~~

While it is true that for FUTEX_LOCK_PI there could be a ESRCH return, it is not allowed for robust mutexes.

Consider the axioms:

* Thread A dies holding robust+pi lock.
* Thread B goes to recover lock but gets ESRCH instead 0.

Thread B can therefore never recover the lock. To recover the lock you must acquire it correctly and without error e.g. ESRCH.

linux/Documentation/robust-futexes.txt
~~~
If a futex is found to be held at exit time, the kernel sets the
following bit of the futex word::

        #define FUTEX_OWNER_DIED        0x40000000

and wakes up the next futex waiter (if any). User-space does the rest of
the cleanup.
~~~

I grant you that all that the kernel docs say is that FUTEX_OWNER_DIED is set, but clearly if the syscall fails with ESRCH then it's unclear if we can rely on the value of the futex.

I argue that in this case the kernel has a bug. The kernel should return 0 so glibc knows it acquired the mutex, and can further check FUTEX_OWNER_DIED bit.

> Or was it an arrangement between kernel/glibc while adding robust mutexes?

You cannot inspect FUTEX_OWNER_DIED bits in the futex if the syscall returns with an error.

My opinion continues to be that this is a bug in the kernel.

> > 
> > This looks like a kernel bug dealing with the interaction of a dying thread
> > that has a robust + PI lock.
> > 
> > Thoughts?
> 
> I've found the following kernel commits which obviously introduces ESRCH in
> attach_to_pi_owner() and it seems as the kernel developers have knowledge
> about this assertion/indefinitely-loop in glibc

I think we need to raise this to the kernel developers.

Are you able to take this issue upstream to the kernel?
Comment 6 Stefan Liebler 2018-07-10 15:51:48 UTC
I've opened kernel "Bug 200467 - The syscall futex with operation FUTEX_LOCK_PI is not allowed to return ESRCH for robust mutexes." (https://bugzilla.kernel.org/show_bug.cgi?id=200467)
Comment 7 Stefan Liebler 2019-01-23 15:55:57 UTC
I've tested the kernel commit https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git/commit/?id=da791a667536bf8322042e38ca85d55a78d3c273 "futex: Cure exit race" from Thomas Gleixner on s390x (inside a zVM-guest) and x86_64 (inside a kvm-guest).

Running on a kernel with this commit, the attached "reduced testcase" runs (with rounds_max = 100000000) without fails. I've also successfully run the original glibc testcase nptl/tst-robustpi4 in a loop.

Running on an older kernel without this commit, the attached "reduced testcase" always failed within ~268000 rounds.
On s390x (inside a zVM-guest) the original glibc testcase failed ~1900 times while running it 1000000 times.

Thus I'm closing this bugzilla.
Thanks.