This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Futex error handling


On Tue, 2014-09-16 at 17:36 +0200, Torvald Riegel wrote:
> We got complains from the kernel side that glibc wouldn't react properly
> to futex errors being returned.  Thus, I'm looking at what we'd need to
> actually improve.  I'm using this here as a documentation for futex
> error codes: https://lkml.org/lkml/2014/5/15/356
> 
> Generally, we have three categories of faults (ie, the cause for an
> error/failure):
> * Bug in glibc ("BL")
> * Bug in the client program ("BP")
> * Failures that are neither a bug in glibc nor the program ("F")
> 
> Also, there are cases where it's not a "real" failure, but just
> something that is expected behavior that needs to be handled ("NF").
> 
> I'm not aware of a general policy about whether glibc should abort or
> assert (ie, abort only with assertion checks enabled) when the fault is
> in the BL or BP categories.  I'd say we don't, because there's no way to
> handle it anyway, and other things will likely go wrong; but I don't
> have a strong opinion.  Thoughts?
> 
> For every futex op, here's a list of how I'd categorize the possible
> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
> BL)):
> 
> FUTEX_WAIT:
> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
> earlier when we accessed the futex variable.
> * EINVAL (alignment and timeout normalization) is BL/BP.
> * EWOULDBLOCK, ETIMEDOUT are NF.
> 
> FUTEX_WAKE, FUTEX_WAKE_OP:
> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
> case.  This is due to how futexes work when combined with certain rules
> for destruction of the underlying synchronization data structure; see my
> description of the mutex destruction issue (but this can happen with
> other data structures such as semaphores or cond vars too):
> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html
> * EINVAL (futex alignment) is BL/BP.
> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
> NF.  The latter is caused by the mutex destruction issue, only that a
> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
> location but one which has been reused for a PI futex.  Thus, we must
> not abort or assert in this case.
> 
> FUTEX_REQUEUE:
> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
> possible destruction / reuse of the futex memory (because requeueing to
> a futex that's unrelated to the new futex located in reused memory is
> bad).
> 
> FUTEX_REQUEUE_CMP:
> * Like FUTEX_REQUEUE.  EAGAIN is NF.
> 
> FUTEX_WAKE_OP:
> * Haven't looked at this yet.  Only used in condvars, and might not be
> necessary for a condvar that's not based on a condvar-internal lock.
> 
> FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
> * Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.
> 
> FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
> * EFAULT is BL/BP.
> * ENOMEM is F.  We need to handle this.
> * EINVAL, EPERM, ESRCH are BL/BP.
> * EAGAIN and ETIMEDOUT are NF.
> * EDEADLOCK is BP (or BL).
> * EOWNERDIED is F.
> 
> FUTEX_UNLOCK_PI:
> (* I guess this can return EFAULT too, which is BL/BP.)
> * EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
> issue with PI locks because the kernel takes care of both resetting the
> value of the futex var and waking up threads; it should do so in a way
> that won't access reused memory.  I guess we should check that though...
> 
> FUTEX_WAIT_REQUEUE_PI:
> * EFAULT and EINVAL are BL/BP.
> * EWOULDBLOCK and ETIMEDOUT are NF.
> * EOWNERDIED is F.
> 
> FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
> * ENOMEM is F.
> * EPERM and ESRCH are BL/BP.
> * EDEADLOCK is BP (or BL).
> 
> 
> I think the next steps to improve this should be:
> 1) Getting consensus on how we want to handle BL and BP in general.

Ping.  I think that's a fairly generic issue (in the sense of likely
being a question for other things than just mutexes too), so I'd like
some input on the direction.

Carlos, Joseph, Roland:  Do you have any comments?

> 2) Applying the outcome of that to the list above and getting consensus
> on the result.
> 3) For each case of F, find the best way to report it to the caller
> (e.g., error code from the pthreads function, abort, ...).
> 4) Change each use of the futexes accordingly, one at a time.
> 
> I've asked Michael Kerrisk for the state of the futex error docs, but
> haven't gotten a reply yet.  (Last time I checked, the new input from
> the email I referred to above wasn't part of the futex docs yet.)

Michael has replied, and it's on his list of things to do.




Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]