This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: exception handling predicament
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: David Miller <davem at davemloft dot net>
- Cc: libc-alpha at sourceware dot org, drepper at gmail dot com, rth at twiddle dot net
- Date: Fri, 19 Aug 2011 07:07:28 -0700
- Subject: Re: exception handling predicament
- References: <20110819.010810.1209537253898782971.davem@davemloft.net>
On Fri, Aug 19, 2011 at 1:08 AM, David Miller <davem@davemloft.net> wrote:
>
> Please read, it took me a very long time to debug this :-/
>
> I think this issue applies to most targets in glibc. ?We've been
> mostly getting away with this simply because gcc has been less
> aggressive optimizing exception regions in the past.
>
> If given -fexceptions, GCC will not recognize an inline asm as
> potentially generating an exception unless both of the following
> are true:
>
> 1) One of the asm operands has a type which is volatile
> 2) -fnon-call-exceptions is given in CFLAGS
>
> This can therefore cause a problem on any platform that implements the
> lowlevellock.h futex operations as inline asm syscalls (i386, x86_64,
> sparc, etc.)
>
> Initially I thought only #1 was the issue, so I reworked the sparc
> lowlevellock.h inline asms such that the volatile types propagate
> properly into the inline asms instead of being casted away.
>
> But it turns out #2 is also needed.
>
> I haven't checked but I imagine this could cause problems in other
> cancellable routines where the exception generating point is an
> inlined syscall and we've enabled async cancellation.
>
> One test case that fails because of this issue is nptl/tst-cancel17.c
> because aio_suspend() has this code sequence involving a cleanup which
> gets implemented using __attribute__((__cleanup__(xxx))):
>
> ? ? ?pthread_cleanup_push (cleanup, &clparam);
>
> #ifdef DONT_NEED_AIO_MISC_COND
> ? ? ?AIO_MISC_WAIT (result, cntr, timeout, 1);
> #else
> ?...
> #endif
>
> ? ? ?pthread_cleanup_pop (0);
>
> AIO_MISC_WAIT() is essentially:
>
> ? ? ? ? ?oldtype = LIBC_CANCEL_ASYNC ();
> ?...
> ? ? ? ?pthread_mutex_unlock (&__aio_requests_mutex);
> ?...
> ? ? ? ? ? ?status = lll_futex_timed_wait (futexaddr, oldval, timeout,
> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? LLL_PRIVATE);
> ?...
> ? ? ? ?if (cancel)
> ? ? ? ? ?LIBC_CANCEL_RESET (oldtype);
> ?...
> ? ? ? ?pthread_mutex_lock (&__aio_requests_mutex);
>
> GCC decides that the cleanup exception range should only cover the
> LIBC_CANCEL_ASYNC() and LIBC_CANCEL_RESET(), because they evaluate to
> function calls which are not marked as __nothrow__.
>
> It should not cover pthread_mutex_{unlock,lock}() because those
> functions have been marked as __nothrow__.
>
> It should also not cover lll_futex_timed_wait() because that's an
> inline asm and we haven't passed -fnon-call-exceptions to GCC.
>
> The result is that gcc does not emit an exception region for
> lll_futex_timed_wait()'s asm, and therefore if the cancel event comes
> in while we're sleeping on that futex call then the aio_suspend()
> cleanups do not run and therefore we eventually crash.
>
> We could pass -fnon-call-exceptions but that seems pretty heavy handed,
> and doesn't actually fix the real problem.
>
> The truth is that __cleanup__ doesn't provide the semantics we want.
>
> The cancel signal (and thus since we're in async mode, the unwind) can
> occur at any instruction in this code sequence. ?Not just instructions
> that "might trap"
>
> They all "might trap." ?It could even happen during one of the
> __nothrow__ functions we call.
>
> So perhaps __cleanup__ is not appropriate for async signal based
> exceptions, as is being used here. ?And we should instead use some
> other cleanup mechanism.
>
> As far as I can tell, aio_suspend() is the only part of librt that
> tries to make use of a pthread cleanup.
>
Is this related to:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48338
--
H.J.