This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: futex(3) man page, final draft for pre-release review

From: Torvald Riegel <triegel at redhat dot com>
To: Davidlohr Bueso <dave at stgolabs dot net>
Cc: "Michael Kerrisk (man-pages)" <mtk dot manpages at gmail dot com>, Thomas Gleixner <tglx at linutronix dot de>, Darren Hart <dvhart at infradead dot org>, lkml <linux-kernel at vger dot kernel dot org>, libc-alpha <libc-alpha at sourceware dot org>, linux-man <linux-man at vger dot kernel dot org>, "Carlos O'Donell" <carlos at redhat dot com>, Roland McGrath <roland at hack dot frob dot com>, Jakub Jelinek <jakub at redhat dot com>, Ingo Molnar <mingo at elte dot hu>, bill o gallmeister <bgallmeister at gmail dot com>, bert hubert <bert dot hubert at netherlabs dot nl>, Jan Kiszka <jan dot kiszka at siemens dot com>, Eric Dumazet <edumazet at google dot com>, Arnd Bergmann <arnd at arndb dot de>, Rusty Russell <rusty at rustcorp dot com dot au>, Heinrich Schuchardt <xypron dot glpk at gmx dot de>, Andy Lutomirski <luto at amacapital dot net>, Daniel Wagner <wagi at monom dot org>, Anton Blanchard <anton at samba dot org>, Steven Rostedt <rostedt at goodmis dot org>, Rich Felker <dalias at libc dot org>, Jonathan Wakely <jwakely at redhat dot com>, Mike Frysinger <vapier at gentoo dot org>, Peter Zijlstra <peterz at infradead dot org>
Date: Fri, 18 Dec 2015 13:26:30 +0100
Subject: Re: futex(3) man page, final draft for pre-release review
Authentication-results: sourceware.org; auth=none
References: <56701916 dot 4090203 at gmail dot com> <20151215224119 dot GA28877 at linux-uzut dot site>

On Tue, 2015-12-15 at 14:41 -0800, Davidlohr Bueso wrote:
> On Tue, 15 Dec 2015, Michael Kerrisk (man-pages) wrote:
> 
> >       When executing a futex operation that requests to block a thread,
> >       the kernel will block only if the futex word has the  value  that
> >       the  calling  thread  supplied  (as  one  of the arguments of the
> >       futex() call) as the expected value of the futex word.  The load???
> >       ing  of the futex word's value, the comparison of that value with
> >       the expected value, and the actual blocking  will  happen  atomi???
> >
> >FIXME: for next line, it would be good to have an explanation of
> >"totally ordered" somewhere around here.
> >
> >       cally  and totally ordered with respect to concurrently executing
> >       futex operations on the same futex word.
> 
> So there are two things here regarding ordering. One is the most obvious
> which is ordered due to the taking/dropping the hb spinlock.

I suppose that this means what is described in the manpage already?
That is, that futex operations (ie, the syscalls) are atomic wrt each
other and in a strict total order?

> Secondly, its
> the cases which Peter brought up a while ago that involves atomic futex ops
> futex_atomic_*(), which	do not have clearly defined semantics, and you get
> inconsistencies with certain archs (tile being the worst iirc).

OK.  So, from a user's POV, this is about the semantics of the kernel's
accesses to the futex word.  I agree that specifying this more clearly
would be helpful.

First, there are the comparisons of the futex words used in, for
example, FUTEX_WAIT.  They should use an atomic load within the
conceptual critical sections that make up futex operations.  This load
itself doesn't need to establish any ordering, so it can be equivalent
to a C11 memory_order_relaxed load.  Are there any objections to that?

Second, We have the write accesses in FUTEX_[TRY]LOCK_PI and
FUTEX_UNLOCK_PI.  We already specify those as atomic and within the
conceptual critical sections of the futex operation.  In addition, they
should establish ordering themselves, so C11 have memory_order_acquire /
memory_order_release semantics.  Specifying this would be good.  Any
objections to these semantics?

Third, we have the atomic read-modify-write operation that is part of
FUTEX_WAKE_OP (ie, AFAIU, the case you pointed at specifically).  I
don't have a strong opinion on what it should be, because I think
userspace can enforce the orderings it needs on its own (eg, if I
interpret Peter Zijlstra's example correctly, userspace can add
appropriate fences before the CPU0/futex_unlock and after the
CPU2/futex_load calls).  FUTEX_WAKE_OP accesses no other userspace
memory location, so there's no ordering relation to other accesses to
userspace memory that userspace cannot affect.
OTOH, legacy userspace may have assumed strong semantics, so making the
read-modify-write have memory_order_seq_cst semantics is probably a safe
bet.  Futex operations typically shouldn't be on the fast paths anyway.

> But anyway, the important thing users need to know about is that the atomic
> futex operation must be totally ordered wrt any other user tasks that are trying
> to access that address.

I'm not sure what you mean precisely.  One can't order the whole futex
operations totally wrt memory accesses by userspace because they'd need
to synchronize to do that, and thus userspace would to hvae either hook
into the kernel's synchronization or use HTM or such.

> This is not necessarily the case for kernel ops. Peter
> illustrates this nicely with lock stealing example; 
> (see https://lkml.org/lkml/2015/8/26/596).
> 
> Internally, I believe we decided that making it fully ordered (as opposed to
> making use of implicit barriers for ACQUIRE/RELEASE), so you'd endup having
> an MB ll/sc MB kind of setup.

OK.  So, any objections to documenting that the read-modify-write op in
FUTEX_WAKE_OP has memory_order_seq_cst semantics?

References:
- futex(3) man page, final draft for pre-release review
  - From: Michael Kerrisk (man-pages)

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]