Bug 12683 - Race conditions in pthread cancellation
Summary: Race conditions in pthread cancellation
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: unspecified
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks: 14147
  Show dependency treegraph
 
Reported: 2011-04-18 22:28 UTC by Rich Felker
Modified: 2016-05-16 17:10 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Demonstration of file descriptor leak due to problem 1 (498 bytes, text/x-csrc)
2011-04-18 22:28 UTC, Rich Felker
Details
Demonstration of problem 2 (443 bytes, text/plain)
2011-04-18 22:34 UTC, Rich Felker
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2011-04-18 22:28:16 UTC
Created attachment 5676 [details]
Demonstration of file descriptor leak due to problem 1

The current approach to implementing pthread cancellation points is to enable asynchronous cancellation prior to making the syscall, and restore the previous cancellation type once the syscall returns. I've asked around and heard conflicting answers as to whether this violates the requirements in POSIX (I believe it does), but either way, from a quality of implementation standpoint this approach is very undesirable due to at least 2 problems, the latter of which is very serious:

1. Cancellation can act after the syscall has returned from kernelspace, but before userspace saves the return value. This results in a resource leak if the syscall allocated a resource, and there is no way to patch over it with cancellation handlers. Even if the syscall did not allocate a resource, it may have had an effect (like consuming data from a socket/pipe/terminal buffer) which the application will never see.

2. If a signal is handled while the thread is blocked at a cancellable syscall, the entire signal handler runs with asynchronous cancellation enabled. This could be extremely dangerous, since the signal handler may call functions which are async-signal-safe but not async-cancel-safe. Even worse, the signal handler may call functions which are not even async-signal-safe (like stdio) if it knows the interrupted code could only be using async-signal-safe functions, and having a thread asynchronously terminated while modifying such functions' internal data structures could lead to serious program malfunction.

I am attaching simple programs which demonstrate both issues.

The solution to problem 2 is making the thread's current execution context (e.g. stack pointer) at syscall time part of the cancellability state, so that cancellation requests received while the cancellation point is interrupted by a signal handler can identify that the thread is not presently in the cancellable context.

The solution to problem 1 is making successful return from kernelspace and exiting the cancellable state an atomic operation. While at first this seems impossible without kernel support, I have a working implementation in musl (http://www.etalabs.net/musl) which solves both problems.
Comment 1 Rich Felker 2011-04-18 22:34:44 UTC
Created attachment 5677 [details]
Demonstration of problem 2

This program should hang, or possibly print x=0 if scheduling is really wacky. If it exits printing a nonzero value of the volatile variable x, this means the signal handler wrongly executed under asynchronous cancellation.
Comment 2 Rich Felker 2011-09-21 18:30:01 UTC
It's been 5 months since I filed this bug and there's been no response. I believe this issue it important enough to at least deserve a response. From my perspective, it makes NPTL's pthread_cancel essentially unusable. I've even included a proposed solution (albeit not a patch). Getting a confirmation that you acknowledge the issue exists and are open to a solution would open the door for somebody to start the work to integrate the solution with glibc/NPTL and eventually get it fixed.
Comment 3 Rich Felker 2012-04-29 02:55:59 UTC
Ping. Now that there's some will to revisit bugs that have been long-ignored, is anyone willing to look into and confirm the problem I've reported here? I believe problem 2 is extremely serious and could lead to timing-based attacks that corrupt memory and result in deadlocks or worse. Problem 1 is also serious for long-lived processes with high reliability requirements that use thread cancellation, as rare but undetectable resource leaks are nearly inevitable and will accumulate over time.
Comment 4 Rich Felker 2012-09-22 23:13:30 UTC
I just added a detailed analysis of this bug on my blog at http://ewontfix.com/2/
Comment 5 Carlos O'Donell 2013-08-16 15:32:31 UTC
There is interest in fixing this issue. It's just that it's complicated :-)
Comment 6 Carlos O'Donell 2013-08-16 15:34:25 UTC
Let us see if I can't get resources to fix this for 2.19. We've seen some tst-cancel17 problems specifically around this issue where cancellation is delivered between syscall return and error storage and that causes problems.
Comment 7 Rich Felker 2013-08-16 16:22:05 UTC
Glad to hear that. Have you taken a look at musl's cancellation implementation? The same mechanism could be used in glibc, or I think it could be modified somewhat to use DWARF2 CFI instead of the asm labels. The basic approach is that the cancellation signal handler examines the saved program counter register and determines whether it's in the critical range starting just before the pre-syscall check of the cancellation flag and the syscall instruction (based on asm labels for these two endpoints). The kernel then handles the atomicity of side effects for us: if the signal interrupts the syscall, the kernel must either complete what it's doing and return (positioning the program counter just past the address range that would allow cancellation to be acted upon), or reset the program counter to just before the syscall instruction and setup the register contents for restarting after the signal handler (in which case cancellation can be acted upon).
Comment 8 Carlos O'Donell 2013-08-16 16:59:54 UTC
(In reply to Rich Felker from comment #7)
> Glad to hear that. Have you taken a look at musl's cancellation
> implementation? The same mechanism could be used in glibc, or I think it
> could be modified somewhat to use DWARF2 CFI instead of the asm labels. The
> basic approach is that the cancellation signal handler examines the saved
> program counter register and determines whether it's in the critical range
> starting just before the pre-syscall check of the cancellation flag and the
> syscall instruction (based on asm labels for these two endpoints). The
> kernel then handles the atomicity of side effects for us: if the signal
> interrupts the syscall, the kernel must either complete what it's doing and
> return (positioning the program counter just past the address range that
> would allow cancellation to be acted upon), or reset the program counter to
> just before the syscall instruction and setup the register contents for
> restarting after the signal handler (in which case cancellation can be acted
> upon).

I have not looked at musl's cancellation implementation.

I assume you are parsing the rt_sigframe set down by the Linux kernel and extracting information from that?
Comment 9 Rich Felker 2013-08-16 17:14:52 UTC
> I have not looked at musl's cancellation implementation.
> 
> I assume you are parsing the rt_sigframe set down by the Linux kernel and
> extracting information from that?

We are using the ucontext_t received via the third argument to the
SA_SIGINFO type signal handler.
Comment 10 Carlos O'Donell 2013-08-16 18:09:46 UTC
(In reply to Rich Felker from comment #9)
> > I have not looked at musl's cancellation implementation.
> > 
> > I assume you are parsing the rt_sigframe set down by the Linux kernel and
> > extracting information from that?
> 
> We are using the ucontext_t received via the third argument to the
> SA_SIGINFO type signal handler.

Same thing. Thanks.
Comment 11 Carlos O'Donell 2014-01-10 21:31:46 UTC
Alex Oliva and I talked about this particular issue today.

We believe that an entirely userspace solution is possible without assistance from the kernel, but it requires signal wrappers.

Signal wrappers are code that execute before and after a signal handler and does things like save and restore errno (the one use we have for them currently). The signal wrappers would assist in handling deferred cancellation.

The proposed solution would look like this:

* Stop enabling/disabling asynchronous cancellation around syscalls.

* When a blocking library function who is also a cancellation point is entered a word in the thread's TCB (call it IN_SYSCALL) is set to the value of the stack pointer (we assume no further stack adjustments are made before the function exits). The value of IN_SYSCALL is cleared just before the function returns. Deferred cancellation is still checked before and after the syscall.

* Add a signal wrapper to all signals that checks to see if IN_SYSCALL == SP stored in the ucontext_t and if it does it immediately cancels the thread. The check is done upon entry and exit of the wrapper to reduce cancellation latency. Just before unwinding the IN_SYCALL value is cleared.

* When a thread starts we install a SIGCANCEL (SIGRTMIN) handler like we did before, but this handler checks to see if the thread's IN_SYSCALL matches the SP stored in ucontext_t, indicating that cancellation was requested while executing in the cancellation region of a blocking syscall (and no other signal handler executing). In that case the signal handler cancels the thread immediately. If IN_SYSCALL != SP then another signal handler is running and we defer the cancellation to the signal wrapper or syscall wrapper. The SIGCANCEL handler operates as it previously did when asynchronous cancellation was enabled.

Resolved use cases:

- Cancellation delivered between first instruction of function and IN_SYSCALL set: Syscall wrapper code will check for cancellation and act upon it.

- Cancellation delivered between IN_SYSCALL set and syscall: The SIGCANCEL handler will immediately cancel the thread.

- Cancellation delivered between syscall and clearing IN_SYSCALL: The SIGCANCEL handler will immediately cancel the thread.

- Cancellation delivered between clearing of IN_SYSCALL and function return: The next cancellation point will act upon the cancellation (still meets POSIX requirement given escape clause of "The thread is suspended at a cancellation point and the event for which it is waiting occurs").

- Cancellation delivered and thread stopped at syscall is executing multiple nested signal handlers and the first signal handler has not checked IN_SYSCALL yet: Only the first signal delivered will have IN_SYSCALL == SP be true. The SIGCANCEL handler will do nothing. The first signal handler's wrapper will detect the cancellation is active and act upon it as it exits (only after all the other signal handlers have completed).

- Cancellation delivered and thread stopped at syscall is executing multiple nested signal handlers and the first signal handler is exiting and has already checked IN_SYSCALL: The syscall will be interrupted and return. The syscall wrapper will act upon the cancellation request. The goal here is to have the signal handlers finish executing without interruption.

Unresolved use cases:

- Related to bug 14147 -- Cancellation delivered while thread is blocked on an async-safe function (in fact it's only executing async-safe functions during the time a signal can be delivered for this to be valid) and executing a signal handler that longjmp's out of the function. In this case IN_SYSCALL is still set to SP and not cleared. If by luck SP ends up the same, and another thread delivers a cancellation request the SIGCANCEL handler will immediately cancel the thread even though it was not in a cancellation region.

- What if you are executing fork and someone tries to cancel you?

A potential resolution to the first unresolved use case is to use a cleanup handler to reset IN_SYSCALL since such a handler is run when longjmp unwinds the frames. However we then need to consider cancellation during the execution of the cleanup.

I haven't fully thought through what to do with the forking a multithreaded program case, but we should try to see if we can make it work.

Note: Setting IN_SYSCALL must be atomic.
Comment 12 Rich Felker 2014-01-10 22:37:44 UTC
Your proposed solution is a lot more complex and invasive than mine; it's actually almost equivalent to the first-generation solution I used in musl for the problem, which turned out to be a bad idea, and thus got scrapped.

Most importantly, aside from being complex and ugly, it does not actually solve the worst problem, because this case is wrong:

"Cancellation delivered between syscall and clearing IN_SYSCALL: The SIGCANCEL handler will immediately cancel the thread."

In this case, unless the syscall failed with EINTR, you must not act on the cancellation request. Doing so is non-conforming to the requirement that the side effects upon cancellation match the side effects on EINTR (which is just a fancy way of saying, approximately, that cancellation can only take place if the syscall has already done its job, e.g. closing a fd, transferring some bytes, etc.).

In addition, I suspect your solution has further flaws like what happens when you longjmp out of a signal handler that interrupted an AS-safe syscall which is a cancellation point. These issues can be solved with more complexity (extra work in longjmp), but the solution I've proposed is much simpler and has no corner cases that are difficult to handle.
Comment 13 Carlos O'Donell 2014-01-12 18:31:10 UTC
(In reply to Rich Felker from comment #12)
> Your proposed solution is a lot more complex and invasive than mine; it's
> actually almost equivalent to the first-generation solution I used in musl
> for the problem, which turned out to be a bad idea, and thus got scrapped.

Experience is knowing what not to do :-)

> Most importantly, aside from being complex and ugly, it does not actually
> solve the worst problem, because this case is wrong:

I like it when we can talk concretely about use cases.

> "Cancellation delivered between syscall and clearing IN_SYSCALL: The
> SIGCANCEL handler will immediately cancel the thread."
> 
> In this case, unless the syscall failed with EINTR, you must not act on the
> cancellation request. Doing so is non-conforming to the requirement that the
> side effects upon cancellation match the side effects on EINTR (which is
> just a fancy way of saying, approximately, that cancellation can only take
> place if the syscall has already done its job, e.g. closing a fd,
> transferring some bytes, etc.).

I was not aware of this requirement. Is this written in POSIX or did this come about from discussion with the Austin group around the problems with close() being cancelled? Can you provide a reference to this?

> In addition, I suspect your solution has further flaws like what happens
> when you longjmp out of a signal handler that interrupted an AS-safe syscall
> which is a cancellation point. These issues can be solved with more
> complexity (extra work in longjmp), but the solution I've proposed is much
> simpler and has no corner cases that are difficult to handle.

Could you propose your design as a glibc wiki page so that we can look at it and critique it? I'd be happy to adopt your solution, but I Want to review it and put it through the same kind of use-cases as we discussed here.
Comment 14 Rich Felker 2014-01-12 23:54:53 UTC
From XSH 2.9.5 Thread Cancellation:

"The side-effects of acting upon a cancellation request while suspended during a call of a function are the same as the side-effects that may be seen in a single-threaded program when a call to a function is interrupted by a signal and the given function returns [EINTR]. Any such side-effects occur before any cancellation cleanup handlers are called."

This is the important paragraph. By requiring that the side-effects on cancellation match the side effects on EINTR, the standard requires that cancellation cannot be acted upon if other irreversible side effects have already taken place. For example, if a file descriptor has been closed, data transferred, etc. then cancellation can't happen. The following paragraph explains further:

"Whenever a thread has cancelability enabled and a cancellation request has been made with that thread as the target, and the thread then calls any function that is a cancellation point (such as pthread_testcancel() or read()), the cancellation request shall be acted upon before the function returns."

This is simple. If cancellation is already pending when a cancellation point is called, it must be acted upon. The next part is less clear:

"If a thread has cancelability enabled and a cancellation request is made with the thread as a target while the thread is suspended at a cancellation point, the thread shall be awakened and the cancellation request shall be acted upon. It is unspecified whether the cancellation request is acted upon or whether the cancellation request remains pending and the thread resumes normal execution if:

* The thread is suspended at a cancellation point and the event for which it is waiting occurs

* A specified timeout expired

before the cancellation request is acted upon."

This is covering the case of blocking syscalls. If a cancellation request arrives during a blocking syscall, it's normally acted upon, but there's one race condition being described: it's possible that the "event being waited for" arrives just before the cancellation request arrives, but before the target thread unblocks. In this case, it's implementation-defined whether cancellation is acted upon (in which case, by the first paragraph, the event remains pending) or the event is acted upon (in which case the cancellation request remains pending). The reason for there being two bullet points above is that some blocking syscalls wait for either an event or a timeout (think of sem_timedwait or recv with a timeout set by setsockopt), and in that case, the timeout can also be 'consumed' and cause the cancellation to remain pending.

Anyway, this race condition is the whole matter at hand here. The two possibilities allowed (again, due to the limitations imposed by the first paragrah) are acting on the event while leaving cancellation pending, or acting on cancellation while leaving the event pending. But glibc also has a race window where it can act on both the event, producing side effects (because the kernel already has), and act on cancellation. This makes it non-conforming and makes it impossible to use cancellation safely.
Comment 15 Carlos O'Donell 2014-01-13 01:52:33 UTC
(In reply to Rich Felker from comment #14)

Thanks for the recap.

> Anyway, this race condition is the whole matter at hand here. The two
> possibilities allowed (again, due to the limitations imposed by the first
> paragrah) are acting on the event while leaving cancellation pending, or
> acting on cancellation while leaving the event pending. But glibc also has a
> race window where it can act on both the event, producing side effects
> (because the kernel already has), and act on cancellation. This makes it
> non-conforming and makes it impossible to use cancellation safely.

So does this imply that the cancellation *must* happen at some point after errno is known? Thus if a cancellation arrives and we're already in the syscall there is nothing to do but let the syscall return and let the syscall wrapper handle the cancellation. That seems reasonable to me.
Comment 16 Rich Felker 2014-01-13 04:37:30 UTC
There are several points at which the cancellation signal could arrive:

1. Before the final "testcancel" before the syscall is made.
2. Between the "testcancel" and the syscall.
3. While the syscall is blocked and no side effects have yet taken place.
4. While the syscall is blocked but with some side effects already having taken place (e.g. a partial read or write).
5. After the syscall has returned.

You want to act on cancellation in cases 1-3 but not in case 4 or 5. Handling case 1 is of course trivial, since you're about to do a conditional branch based on whether the thread has received a cancellation request; nothing needs to be done in the signal handler (but it also wouldn't hurt to handle it from the signal handler). Case 2 can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction.

The rest of the cases are the "tricky" part but it turns out they too are easy:

Case 3: In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of.

Case 4: In this case, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result it obtained (e.g. partial read or write). In this case, the saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation.

Case 5: OK, I lied. This one is trivial too since the program counter is past the syscall instruction already.

What about syscalls that fail with EINTR even when the signal handler is non-interrupting? In this case, the syscall wrapper code can just check the cancellation flag when the errno result is EINTR, and act on cancellation if it's set. Note that an exception needs to be made for close(), where EINTR should be treated as EINPROGRESS and thus not permit cancellation to take place.

BTW, I should justify why the signal handler should be non-interrupting (SA_RESTART): if it weren't, you would risk causing spurious EINTR in programs not written to handle it, e.g. if the user incorrectly send signal 32/33 to the process or if pthread_cancel were called while cancellation is disabled in the target thread. The kernel folks have spent a great deal of effort getting rid of spurious EINTRs (which cause all sorts of ugly bugs) and it would be a shame to reintroduce them. Also it doesn't buy you anything moving the cancellation action to the EINTR check after the syscall returns; the same check in the signal handler that handles case 2 above also handles the case of restartable syscalls correctly, for free.
Comment 17 Carlos O'Donell 2014-01-14 14:51:43 UTC
(In reply to Rich Felker from comment #16)
> There are several points at which the cancellation signal could arrive:
> 
> 1. Before the final "testcancel" before the syscall is made.
> 2. Between the "testcancel" and the syscall.
> 3. While the syscall is blocked and no side effects have yet taken place.
> 4. While the syscall is blocked but with some side effects already having
> taken place (e.g. a partial read or write).
> 5. After the syscall has returned.
> 
> You want to act on cancellation in cases 1-3 but not in case 4 or 5.
> Handling case 1 is of course trivial, since you're about to do a conditional
> branch based on whether the thread has received a cancellation request;
> nothing needs to be done in the signal handler (but it also wouldn't hurt to
> handle it from the signal handler). Case 2 can be caught by the signal
> handler determining that the saved program counter (from the ucontext_t) is
> in some address range beginning just before the "testcancel" and ending with
> the syscall instruction.
> 
> The rest of the cases are the "tricky" part but it turns out they too are
> easy:
> 
> Case 3: In this case, except for certain syscalls that ALWAYS fail with
> EINTR even for non-interrupting signals, the kernel will reset the program
> counter to point at the syscall instruction during signal handling, so that
> the syscall is restarted when the signal handler returns. So, from the
> signal handler's standpoint, this looks the same as case 2, and thus it's
> taken care of.
> 
> Case 4: In this case, the kernel cannot restart the syscall; when it's
> interrupted by a signal, the kernel must cause the syscall to return with
> whatever partial result it obtained (e.g. partial read or write). In this
> case, the saved program counter points just after the syscall instruction,
> so the signal handler won't act on cancellation.
> 
> Case 5: OK, I lied. This one is trivial too since the program counter is
> past the syscall instruction already.

Excellent. I like your idea then. It seems like a list of PC's using either markers or dwarf2 is the way to go here.

> What about syscalls that fail with EINTR even when the signal handler is
> non-interrupting? In this case, the syscall wrapper code can just check the
> cancellation flag when the errno result is EINTR, and act on cancellation if
> it's set. Note that an exception needs to be made for close(), where EINTR
> should be treated as EINPROGRESS and thus not permit cancellation to take
> place.

We'll need a big disclaimer about close and a detailed comment. I know some of the details there, specifically that although EINTR has been returned the close will complete.

> BTW, I should justify why the signal handler should be non-interrupting
> (SA_RESTART): if it weren't, you would risk causing spurious EINTR in
> programs not written to handle it, e.g. if the user incorrectly send signal
> 32/33 to the process or if pthread_cancel were called while cancellation is
> disabled in the target thread. The kernel folks have spent a great deal of
> effort getting rid of spurious EINTRs (which cause all sorts of ugly bugs)
> and it would be a shame to reintroduce them. Also it doesn't buy you
> anything moving the cancellation action to the EINTR check after the syscall
> returns; the same check in the signal handler that handles case 2 above also
> handles the case of restartable syscalls correctly, for free.

That makes sense.
Comment 18 Jackie Rosen 2014-02-16 19:42:14 UTC
*** Bug 260998 has been marked as a duplicate of this bug. ***
Seen from the domain http://volichat.com
Page where seen: http://volichat.com/adult-chat-rooms
Marked for reference. Resolved as fixed @bugzilla.
Comment 19 Steven Stewart-Gallus 2014-07-19 18:43:59 UTC
I am confused but if the proposed fix for this bug is implemented than
that means that my bug at
https://sourceware.org/bugzilla/show_bug.cgi?id=17168 where I can't
cancel FUTEX_WAITs would be automatically fixed right? I would have to
do no extra effort to let my blocking system call be cancellable? So
this proposed fix would have a side benefit of giving me
cancellability for free? Or would this be a bug or at least a breaking
change?
Comment 20 Rich Felker 2014-07-19 18:54:33 UTC
Steven, I don't think this bug is related to your issue. If bug 9712
(of which your 17168 seems to be a duplicate) is resolved by adding
futex and glibc makes futex a cancellation point, THEN the resolution
of this bug (12683) would make it safe to use cancellation with the
futex function in a way that's race-free.

I think this is a strong argument for resolving 9712 by adding futex:
unless it's part of libc, there's no safe way to make it cancellable,
because wrapping syscall() with async cancellation will introduce an
application-level bug comparable to this bug.
Comment 21 Steven Stewart-Gallus 2014-07-20 18:15:16 UTC
Okay Rich Felker, I was confused because the implementation in Musl
seems to look it'd would make syscall users that block cancellation
points. So, if GLibc does something similar to your solution they
would have to explicitly block cancels in the syscall function to
preserve compatibility (and in the future GLibc might possibly
consider moving over to making syscall automatically cancellable for
blocking system calls but that'd be a separate issue)?
Comment 22 Rich Felker 2014-07-20 18:41:08 UTC
Steven, I'm not sure I understand what you're saying. This issue report is not about changing which syscalls/functions are cancellable. For the standard functions that is specified by POSIX, and for extensions, the natural choices were already made and changing them would be problematic to their users. The topic at hand is just fixing the mechanism by which cancellation is performed so that there are not race conditions.

If your question is about the syscall() function that applications can use to make syscalls directly, there is no open issue for making it cancellable, and as above, changing this would be problematic. One could envision a request for a separate version of the syscall() function which is cancellable, but as far as I know nobody has requested this and I think it's a bad idea to be adding features that encourage applications to make syscalls directly (since this is usually non-portable between archs due to subtle differences in the calling conventions and other issues like whether the libc-level structs match the syscall-level ones for a given arch).
Comment 23 Adhemerval Zanella Netto 2014-08-28 14:03:20 UTC
I am currently working on a fix based on musl implementation, which from comments #16 and #17 seems a good approach.  My initial idea is to use PC markers instead of DWARF2, since it see it as more clean approach. However, this is require a lot of cleanup.

I plan push implementations for powerpc64 and x86_64 and ask for arch maintainers for more arch specific work. I also plan to write a wiki page describing the work done and summarizing the discussion on this bug report.
Comment 24 Carlos O'Donell 2014-08-28 15:02:33 UTC
(In reply to Adhemerval Zanella Netto from comment #23)
> I am currently working on a fix based on musl implementation, which from
> comments #16 and #17 seems a good approach.  My initial idea is to use PC
> markers instead of DWARF2, since it see it as more clean approach. However,
> this is require a lot of cleanup.
> 
> I plan push implementations for powerpc64 and x86_64 and ask for arch
> maintainers for more arch specific work. I also plan to write a wiki page
> describing the work done and summarizing the discussion on this bug report.

The biggest problem with DWARF2 is the parser, and making it accessible from the signal handler. I strongly suggest using PC, and a list of exception regions generated from markers in the assembly (similar to kernel exception regions).
Comment 25 Dan Searle 2015-01-15 13:20:17 UTC
I think we have stubmled upon this bug, or something related to it. Can someone please confirm I'm on the right track here?

We have a multithreaded server application which calls recv() and poll() from async cancellable threads, each thread handles a single connection with a master thread accpeting new connections and adding them to a job queue.

More and more often now we are seeing the server lock up and on inspection two or more threads seem deadlocked in some race condition inside libc recv() and or poll().

One example here shows two back traces from gdb from the two threads that seemed deadlocked chewing 100% CPU:

Thread 1 bt:
#0  __pthread_disable_asynccancel () at ../nptl/sysdeps/unix/sysv/linux/x86_64/cancellation.S:98
#1  0x00007f895ba987fd in __libc_recv (fd=0, fd@entry=33, buf=buf@entry=0x7cada02b, n=n@entry=1024, flags=1537837035,
    flags@entry=16384) at ../sysdeps/unix/sysv/linux/x86_64/recv.c:35
#2  0x000000000040ec54 in recv (__flags=16384, __n=1024, __buf=0x7cada02b, __fd=33)
    at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
[snip]

Thread 2 bt:
#0  0x00007f895ba987eb in __libc_recv (fd=fd@entry=31, buf=buf@entry=0x7ca5e02b, n=n@entry=1024, flags=-1, flags@entry=16384)
    at ../sysdeps/unix/sysv/linux/x86_64/recv.c:33
#1  0x000000000040ec54 in recv (__flags=16384, __n=1024, __buf=0x7ca5e02b, __fd=31)
    at /usr/include/x86_64-linux-gnu/bits/socket2.h:44
[snip]

There can be more than two threads involved, but I'm unsure if it can happen with just one thread locked up, but it's always inside recv() or poll() and sometimes in __pthread_disable_asynccancel() within either of those.

Could I work around this problem by changing the threads to syncronmous cancellable or try to work around the need to cancel the treads at all?
Comment 26 Rich Felker 2015-01-15 13:31:26 UTC
I don't think this is the bug you're seeing. If it were, use of async cancellation would only make it worse. But the symptoms you'd see from this bug would be things like side effects of a function having happened despite it getting cancelled.

If you're seeing 100% cpu load from threads in recv, the most likely explanation is that the socket you're reading from is in EOF status (remote sending end closed), so that recv immediately returns zero. Repeatedly attempting to read in this situation would be an application bug, not anything related to glibc.
Comment 27 Dan Searle 2015-01-15 14:01:41 UTC
(In reply to Rich Felker from comment #26)
> I don't think this is the bug you're seeing. If it were, use of async
> cancellation would only make it worse. But the symptoms you'd see from this
> bug would be things like side effects of a function having happened despite
> it getting cancelled.
> 
> If you're seeing 100% cpu load from threads in recv, the most likely
> explanation is that the socket you're reading from is in EOF status (remote
> sending end closed), so that recv immediately returns zero. Repeatedly
> attempting to read in this situation would be an application bug, not
> anything related to glibc.

Thanks Rich, your suggestion made me think to look through the code paths again and you are quite right, there was an infinite loop in there, not obvious but I found it.

In light of the current problems with cancellable threads and syscalls, I'm going to disable cancelation during the main job execution (where all the recv() can poll()) call are, just in case this bug is causing problems I'm unaware of.

Many thanks, you saved me a lot of hair pulling :)