This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

exception handling predicament

From: David Miller <davem at davemloft dot net>
To: libc-alpha at sourceware dot org
Cc: drepper at gmail dot com, rth at twiddle dot net
Date: Fri, 19 Aug 2011 01:08:10 -0700 (PDT)
Subject: exception handling predicament

Please read, it took me a very long time to debug this :-/

I think this issue applies to most targets in glibc.  We've been
mostly getting away with this simply because gcc has been less
aggressive optimizing exception regions in the past.

If given -fexceptions, GCC will not recognize an inline asm as
potentially generating an exception unless both of the following
are true:

1) One of the asm operands has a type which is volatile
2) -fnon-call-exceptions is given in CFLAGS

This can therefore cause a problem on any platform that implements the
lowlevellock.h futex operations as inline asm syscalls (i386, x86_64,
sparc, etc.)

Initially I thought only #1 was the issue, so I reworked the sparc
lowlevellock.h inline asms such that the volatile types propagate
properly into the inline asms instead of being casted away.

But it turns out #2 is also needed.

I haven't checked but I imagine this could cause problems in other
cancellable routines where the exception generating point is an
inlined syscall and we've enabled async cancellation.

One test case that fails because of this issue is nptl/tst-cancel17.c
because aio_suspend() has this code sequence involving a cleanup which
gets implemented using __attribute__((__cleanup__(xxx))):

      pthread_cleanup_push (cleanup, &clparam);

#ifdef DONT_NEED_AIO_MISC_COND
      AIO_MISC_WAIT (result, cntr, timeout, 1);
#else
 ...
#endif

      pthread_cleanup_pop (0);

AIO_MISC_WAIT() is essentially:

	  oldtype = LIBC_CANCEL_ASYNC ();
 ...
	pthread_mutex_unlock (&__aio_requests_mutex);
 ...
	    status = lll_futex_timed_wait (futexaddr, oldval, timeout,
					   LLL_PRIVATE);
 ...
	if (cancel)
	  LIBC_CANCEL_RESET (oldtype);
 ...
	pthread_mutex_lock (&__aio_requests_mutex);

GCC decides that the cleanup exception range should only cover the
LIBC_CANCEL_ASYNC() and LIBC_CANCEL_RESET(), because they evaluate to
function calls which are not marked as __nothrow__.

It should not cover pthread_mutex_{unlock,lock}() because those
functions have been marked as __nothrow__.

It should also not cover lll_futex_timed_wait() because that's an
inline asm and we haven't passed -fnon-call-exceptions to GCC.

The result is that gcc does not emit an exception region for
lll_futex_timed_wait()'s asm, and therefore if the cancel event comes
in while we're sleeping on that futex call then the aio_suspend()
cleanups do not run and therefore we eventually crash.

We could pass -fnon-call-exceptions but that seems pretty heavy handed,
and doesn't actually fix the real problem.

The truth is that __cleanup__ doesn't provide the semantics we want.

The cancel signal (and thus since we're in async mode, the unwind) can
occur at any instruction in this code sequence.  Not just instructions
that "might trap"

They all "might trap."  It could even happen during one of the
__nothrow__ functions we call.

So perhaps __cleanup__ is not appropriate for async signal based
exceptions, as is being used here.  And we should instead use some
other cleanup mechanism.

As far as I can tell, aio_suspend() is the only part of librt that
tries to make use of a pthread cleanup.

Follow-Ups:
- Re: exception handling predicament
  - From: H.J. Lu
- Re: exception handling predicament
  - From: Andreas Schwab

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]