This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Futex error handling


On Tue, 2014-09-16 at 12:56 -0400, Rich Felker wrote:
> On Tue, Sep 16, 2014 at 05:36:25PM +0200, Torvald Riegel wrote:
> > We got complains from the kernel side that glibc wouldn't react properly
> > to futex errors being returned.
> 
> Are these complaints public anywhere?

I don't think so.  Anyway, this point isn't really relevant to the
discussion -- just my personal motivation, kind of :)

> > Thus, I'm looking at what we'd need to
> > actually improve.  I'm using this here as a documentation for futex
> > error codes: https://lkml.org/lkml/2014/5/15/356
> > 
> > Generally, we have three categories of faults (ie, the cause for an
> > error/failure):
> > * Bug in glibc ("BL")
> > * Bug in the client program ("BP")
> > * Failures that are neither a bug in glibc nor the program ("F")
> > 
> > Also, there are cases where it's not a "real" failure, but just
> > something that is expected behavior that needs to be handled ("NF").
> > 
> > I'm not aware of a general policy about whether glibc should abort or
> > assert (ie, abort only with assertion checks enabled) when the fault is
> > in the BL or BP categories.  I'd say we don't, because there's no way to
> > handle it anyway, and other things will likely go wrong; but I don't
> > have a strong opinion.  Thoughts?
> > 
> > For every futex op, here's a list of how I'd categorize the possible
> > error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> > BL)):
> > 
> > FUTEX_WAIT:
> > * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> > earlier when we accessed the futex variable.
> > * EINVAL (alignment and timeout normalization) is BL/BP.
> > * EWOULDBLOCK, ETIMEDOUT are NF.
> 
> I would distingish multiple versions of "BP" for EINVAL here. You seem
> to have mixed "program has invoked undefined behavior" (e.g. invalid
> synchronization object) with "program has provided an erroneous
> argument which the implementation is required to report" (e.g. invalid
> timespec contents). If you don't want to add a new class, the latter
> technically could just be considered NF; it's fully equivalent to NF
> in terms of how it has to be handled.

Good point on timespec.

> > FUTEX_WAKE, FUTEX_WAKE_OP:
> > * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> > case.  This is due to how futexes work when combined with certain rules
> > for destruction of the underlying synchronization data structure; see my
> > description of the mutex destruction issue (but this can happen with
> > other data structures such as semaphores or cond vars too):
> > https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> 
> Note that it's possible to use FUTEX_WAKE_OP in such a way that EFAULT
> is reserved for BL/BP (and not NF). I don't see any point in
> having/using FUTEX_WAKE_OP except for this purpose, but maybe I'm
> missing something.

I agree that I was a bit sloppy in the categorization.  You're right
that depending on how it's used, EFAULT can be just BL/BP.  This applies
to both FUTEX_WAKE and FUTEX_WAKE_OP, I think; the latter has just a
finite number of bits, so you can't avoid an ABA issue entirely.  You
can use the latter like FUTEX_UNLOCK_PI though, to try to avoid the
mutex destruction issue.

So, to summarize, my categories kind of assume a "typical" use of those
operations in glibc.  What I was trying to point out is that we can't
abort in the generic futex syscall code when we see EFAULT, because
that's wrong for typical uses of FUTEX_WAKE.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]