This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: handle_exit_race && PF_EXITING
- From: Thomas Gleixner <tglx at linutronix dot de>
- To: Oleg Nesterov <oleg at redhat dot com>
- Cc: Florian Weimer <fweimer at redhat dot com>, Shawn Landden <shawn at git dot icu>, libc-alpha at sourceware dot org, linux-api at vger dot kernel dot org, LKML <linux-kernel at vger dot kernel dot org>, Arnd Bergmann <arnd at arndb dot de>, Deepa Dinamani <deepa dot kernel at gmail dot com>, Andrew Morton <akpm at linux-foundation dot org>, Catalin Marinas <catalin dot marinas at arm dot com>, Keith Packard <keithp at keithp dot com>, Peter Zijlstra <peterz at infradead dot org>
- Date: Tue, 5 Nov 2019 18:59:34 +0100 (CET)
- Subject: Re: handle_exit_race && PF_EXITING
- References: <20191104002909.25783-1-shawn@git.icu> <87woceslfs.fsf@oldenburg2.str.redhat.com> <alpine.DEB.2.21.1911051053470.17054@nanos.tec.linutronix.de> <20191105152728.GA5666@redhat.com> <alpine.DEB.2.21.1911051800070.1869@nanos.tec.linutronix.de>
On Tue, 5 Nov 2019, Thomas Gleixner wrote:
> On Tue, 5 Nov 2019, Oleg Nesterov wrote:
> > On 11/05, Thomas Gleixner wrote:
> > >
> > > Out of curiosity, what's the race issue vs. robust list which you are
> > > trying to solve?
> >
> > Off-topic, but this reminds me...
> >
> > #include <sched.h>
> > #include <assert.h>
> > #include <unistd.h>
> > #include <syscall.h>
> >
> > #define FUTEX_LOCK_PI 6
> >
> > int main(void)
> > {
> > struct sched_param sp = {};
> >
> > sp.sched_priority = 2;
> > assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
> >
> > int lock = vfork();
> > if (!lock) {
> > sp.sched_priority = 1;
> > assert(sched_setscheduler(0, SCHED_FIFO, &sp) == 0);
> > _exit(0);
> > }
> >
> > syscall(__NR_futex, &lock, FUTEX_LOCK_PI, 0,0,0);
> > return 0;
> > }
> >
> > this creates the unkillable RT process spinning in futex_lock_pi() on
> > a single CPU machine (or you can use taskset).
>
> Uuurgh.
But staring more at it. That's a scheduler bug.
parent child
set FIFO prio 2
fork() -> set FIFO prio 1
sched_setscheduler(...)
return from syscall <= BUG
_exit()
When the child lowers its priority from 2 to 1, then the parent _must_
preempt the child simply because the parent is now the top priority task on
that CPU. Child should never reach exit before the parent blocks on the
futex.
Peter?
What's even more disturbing is that even with that bug happening the child
is able to set PF_EXITING, but not PF_EXITPIDONE. That doesn't make sense.
Thanks,
tglx