This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [RFC][BZ #16549] Add sanity check for condvar alignment.
- From: Torvald Riegel <triegel at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Florian Weimer <fweimer at redhat dot com>, libc-alpha at sourceware dot org
- Date: Sat, 12 Apr 2014 00:52:32 +0200
- Subject: Re: [RFC][BZ #16549] Add sanity check for condvar alignment.
- Authentication-results: sourceware.org; auth=none
- References: <20140211124346 dot GA31165 at domone dot podge> <52FA4AC2 dot 1070400 at redhat dot com> <1397247983 dot 10643 dot 18244 dot camel at triegel dot csb> <20140411212958 dot GA29703 at domone dot podge>
On Fri, 2014-04-11 at 23:29 +0200, OndÅej BÃlka wrote:
> On Fri, Apr 11, 2014 at 10:26:23PM +0200, Torvald Riegel wrote:
> > On Tue, 2014-02-11 at 17:07 +0100, Florian Weimer wrote:
> > > On 02/11/2014 01:43 PM, OndÅej BÃlka wrote:
> > >
> > > > A more conservative solution is add assert in initialization to check
> > > > alignment. Following patch does that, should be same check added for
> > > > mutex/semaphores?
> > >
> > > I think the real issue here is our lack of error checking for the futex
> > > system call. strace on the test case shows this:
> > >
> > > [pid 12278] futex(0x6010cd, FUTEX_WAIT_PRIVATE, 1, NULL) = -1 EINVAL
> > > (Invalid argument)
> >
> > I do not think this is helpful. There's lots of undefined behavior in
> > the languages and in lots of other places, and there's a reason for
> > that. I agree that this may seem less "forgiving" in face of programmer
> > errors, but an assert or returning an error code is unlikely to be
> > really a solution. A buggy program will, I guess, often also not check
> > error codes. Especially for things like synchronization constructs
> > where typically, there's no real recovery / alternative solution in a
> > program anyway -- if you need mutual exclusion to go on, what do you do
> > if you can't get it? Just stop doing anything?
> >
> Torvald, failed assert does terminate a program. Could you explain what
> do you mean with error recovery?
We have a fault (e.g., a misaligned condvar). That leads to an error
(e.g., the condvar misbehaving). We can turn this into an immediate
failure by, for example, terminating the program (i.e., the user of the
program would observe the failure). Thus, error recovery is trying to
overcome the problem of being in an erroneous situation; for example,
you can try to do something else that's hopefully not prone to the same
fault. The "something else" might be something that's functionally
equivalent -- but in the mutex case, the program is probably not able to
move over to a different mutual exclusion implementation or
synchronization scheme. It could also be something of less value. Or
it could just fail in case of an error (e.g., terminate the program).
But if you fail, then there needs to be handling of the failure (i.e.,
whoever used it is now affected), and this might or might not make
things easier; sometimes higher-layer users can deal well with fail-fast
components, sometimes they really want no failures (and even, for
example, would prefer graceful degradation).
IOW, I don't think there's a one-size-fits-all solution, and we can't
fix this by just adding asserts. Instead, if we want to fix this, I
think we need to look for better integration with the overarching
error/failure handling, and provide choice. I'm aware that this sounds
pretty vague, but I hope it gets the point across. If you want another
example of the same abstract issue, try thinking about how to choose
timeouts in a library, and whether there's a one-size-fits-all for
that...