This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Futex error handling


On 10/20/2014 12:14 PM, Torvald Riegel wrote:
> On Tue, 2014-09-16 at 17:36 +0200, Torvald Riegel wrote:
>> We got complains from the kernel side that glibc wouldn't react properly
>> to futex errors being returned.  Thus, I'm looking at what we'd need to
>> actually improve.  I'm using this here as a documentation for futex
>> error codes: https://lkml.org/lkml/2014/5/15/356
>>
>> Generally, we have three categories of faults (ie, the cause for an
>> error/failure):
>> * Bug in glibc ("BL")
>> * Bug in the client program ("BP")
>> * Failures that are neither a bug in glibc nor the program ("F")
>>
>> Also, there are cases where it's not a "real" failure, but just
>> something that is expected behavior that needs to be handled ("NF").
>>
>> I'm not aware of a general policy about whether glibc should abort or
>> assert (ie, abort only with assertion checks enabled) when the fault is
>> in the BL or BP categories.  I'd say we don't, because there's no way to
>> handle it anyway, and other things will likely go wrong; but I don't
>> have a strong opinion.  Thoughts?
>>
>> For every futex op, here's a list of how I'd categorize the possible
>> error codes (I'm ignoring ENOSYS, which is NF when feature testing (or
>> BL)):
>>
>> FUTEX_WAIT:
>> * EFAULT is either BL or BP.  Nothing we can do.  Should have failed
>> earlier when we accessed the futex variable.

Abort.

>> * EINVAL (alignment and timeout normalization) is BL/BP.

Abort.

>> * EWOULDBLOCK, ETIMEDOUT are NF.

Don't know.

WARNING: Aborting on EWOULDBLOCK is dangerous since we might just
wish to deadlock the single thread, so a developer can attach with
the debugger to inspect thread state while other threads keep running.

>> FUTEX_WAKE, FUTEX_WAKE_OP:
>> * EFAULT can be BL/BP *or* NF, so we *must not* abort or assert in this
>> case.  This is due to how futexes work when combined with certain rules
>> for destruction of the underlying synchronization data structure; see my
>> description of the mutex destruction issue (but this can happen with
>> other data structures such as semaphores or cond vars too):
>> https://sourceware.org/ml/libc-alpha/2014-04/msg00075.html

Don't know.

>> * EINVAL (futex alignment) is BL/BP.

Abort.

>> * EINVAL (inconsistent state or hit a PI futex) can be either BL/BP *or*
>> NF.  The latter is caused by the mutex destruction issue, only that a
>> pending FUTEX_WAKE after destruction doesn't hit an inaccessible memory
>> location but one which has been reused for a PI futex.  Thus, we must
>> not abort or assert in this case.

Don't know.

>>
>> FUTEX_REQUEUE:
>> * Like FUTEX_WAKE, except that it's not safe to use concurrently with
>> possible destruction / reuse of the futex memory (because requeueing to
>> a futex that's unrelated to the new futex located in reused memory is
>> bad).

Are there any error codes for FUTEX_REQUEUE or are you saying it's
identical to FUTEX_WAKE?

>> FUTEX_REQUEUE_CMP:
>> * Like FUTEX_REQUEUE.  EAGAIN is NF.

Is EGAIN the only one to consider?

Yes, EAGAIN needs some kind of special handling, but I don't know what.

>> FUTEX_WAKE_OP:
>> * Haven't looked at this yet.  Only used in condvars, and might not be
>> necessary for a condvar that's not based on a condvar-internal lock.
>>
>> FUTEX_WAIT_BITSET / FUTEX_WAKE_BITSET:
>> * Like FUTEX_WAIT / FUTEX_WAKE.  The additional EINVAL is BL.

Abort.

>> FUTEX_LOCK_PI, FUTEX_TRYLOCK_PI:
>> * EFAULT is BL/BP.

Abort.

>> * ENOMEM is F.  We need to handle this.

Yes, pass it along.

>> * EINVAL, EPERM, ESRCH are BL/BP.

Abort.

>> * EAGAIN and ETIMEDOUT are NF.

Don't know.

>> * EDEADLOCK is BP (or BL).

See WARNING above.

>> * EOWNERDIED is F.

Don't know.

>> FUTEX_UNLOCK_PI:
>> (* I guess this can return EFAULT too, which is BL/BP.)
>> * EINVAL and EPERM are BL/BP.  I don't think there's a mutex destruction
>> issue with PI locks because the kernel takes care of both resetting the
>> value of the futex var and waking up threads; it should do so in a way
>> that won't access reused memory.  I guess we should check that though...

Abort.

>> FUTEX_WAIT_REQUEUE_PI:
>> * EFAULT and EINVAL are BL/BP.

Abort.

>> * EWOULDBLOCK and ETIMEDOUT are NF.

Don't know.

>> * EOWNERDIED is F.

Don't know. How did we get here?

>> FUTEX_CMP_REQUEUE_PI is like FUTEX_CMP_REQUEUE except:
>> * ENOMEM is F.

Pass it along.

>> * EPERM and ESRCH are BL/BP.

Abort.

>> * EDEADLOCK is BP (or BL).

See WARNING above.

>>
>>
>> I think the next steps to improve this should be:
>> 1) Getting consensus on how we want to handle BL and BP in general.
> 
> Ping.  I think that's a fairly generic issue (in the sense of likely
> being a question for other things than just mutexes too), so I'd like
> some input on the direction.
> 
> Carlos, Joseph, Roland:  Do you have any comments?

Roland already responded, but I'll reiterate with wiki updates:

(1) Bugs in the GNU C library should fail early and visibly, balanced by
runtime detection cost.

(2) Bugs in the user program should fail early and visibly, balanced by
runtime detection cost in the GNU C library.

(3) Unknown error codes from the kernel should cause immediate failure.

https://sourceware.org/glibc/wiki/Style_and_Conventions#Error_Handling

>> 2) Applying the outcome of that to the list above and getting consensus
>> on the result.
>> 3) For each case of F, find the best way to report it to the caller
>> (e.g., error code from the pthreads function, abort, ...).
>> 4) Change each use of the futexes accordingly, one at a time.
>>
>> I've asked Michael Kerrisk for the state of the futex error docs, but
>> haven't gotten a reply yet.  (Last time I checked, the new input from
>> the email I referred to above wasn't part of the futex docs yet.)
> 
> Michael has replied, and it's on his list of things to do.

Hopefully that helps, maybe it doesn't, I answered the easy ones, and
perhaps the first step is to fix the easy ones first :-)

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]