This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Futex error handling

From: "Carlos O'Donell" <carlos at redhat dot com>
To: Roland McGrath <roland at hack dot frob dot com>, Torvald Riegel <triegel at redhat dot com>
Cc: GLIBC Devel <libc-alpha at sourceware dot org>, Darren Hart <dvhart at infradead dot org>
Date: Wed, 22 Oct 2014 22:36:36 -0400
Subject: Re: Futex error handling
Authentication-results: sourceware.org; auth=none
References: <1410881785 dot 4967 dot 292 dot camel at triegel dot csb> <1413821696 dot 8483 dot 40 dot camel at triegel dot csb> <20141021223404 dot DBC562C3AB2 at topped-with-meat dot com>

On 10/21/2014 06:34 PM, Roland McGrath wrote:
> The high-level comment is that we have always favored having actual bugs
> cause quick and complete failure.

Agreed.
 
> If it's a bug in libc, then it should fail early and catastrophically so
> that we find out about the bug as soon as possible.  That trades off
> against any runtime cost of detecting the case.  If it's cheap to detect,
> then detect it.  If it's not so cheap, then don't pay the cost because we
> don't expect that we'll have the bug.  Using assert is a middle ground for
> things that have enough cost that we don't just leave them in all the time,
> but little enough that there's still any question about it.  I don't know
> what the distribution of NDEBUG use is like across distributions.  If every
> distribution builds production libc with NDEBUG then in practice assert
> will not catch any real-world problems and it shouldn't really count as
> runtime detection, because only people developing libc will ever see it.

RHEL builds with -DNDEBUG. I expect every distribution does the same.	

> If it's user code invoking undefined behavior, then it should fail early
> and catastrophically so that developers don't get the false impression that
> their code is OK when it happens not to break the use cases they test
> adequately.  (Said another way, so that we avoid giving developers an
> excuse to complain when a future implementation change "breaks" their
> programs that were always broken, but theretofore ignorably so.)  That too
> trades off against any runtime cost of detecting the case.  I'd say the
> allowance for cost of detection is marginally higher than in the first
> case, because we expect user bugs to be more common that libc bugs.  But
> it's still not much, since correct programs performing better is more
> important to us than buggy programs being easier to debug.

Agreed. Apparently the kernel detection of user bugs is very low cost,
the kernel has a broader view of all the locks than glibc does. Thus error
returns from glibc should IMO become immediate catastrophic failures if
the error indicates undefined behaviour. If the situation is in any way
recoverable, we must return the error code to the user.

> Those are generic principles.  There's another kind of case I don't think
> you mentioned, that is especially apropros for the futex operations.  That
> is unexpected results from the kernel.  That could of course just be a libc
> bug that causes its expectations to be wrong.  But it could also be a
> kernel bug, or a new compatibility problem (e.g. some system call starts
> returning new error codes in a new kernel version that weren't possible
> when the libc code was written, built, and tested).  For those I'm not sure
> there is any general rule that will really help.  It might just require
> careful consideration case by case for what is the wisest form of
> future-proofing.  Sometimes, propagating whatever error the kernel gave
> back to the user is clearly the best thing to do.  But there might also be
> situations where an unexpected result means that libc has become confused
> about what state the kernel left things in, and crashing would be better.
> And finally, there might well be instances of kernel bugs that we could
> adequately recognize and work around.

I would prefer that unexpected error codes cause glibc to crash. This
immediately alerts kernel and glibc developers of a mismatch in their
expectations before this ever gets out of experimental distributions.

That's just my preference.

Cheers,
Carlos.

Follow-Ups:
- Re: Futex error handling
  - From: Roland McGrath

References:
- Re: Futex error handling
  - From: Torvald Riegel
- Re: Futex error handling
  - From: Roland McGrath

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]