Bug 2748

Summary: Cancel from printf not calling the cancel handler
Product: glibc Reporter: Steven Munroe <sjmunroe>
Component: nptlAssignee: Ulrich Drepper <drepper.fsp>
Status: RESOLVED WORKSFORME    
Severity: critical CC: amodra, bergner, glibc-bugs
Priority: P1 Flags: fweimer: security-
Version: 2.4   
Target Milestone: ---   
Host: Target:
Build: Last reconfirmed:
Attachments: C++ test case the demonstrats the problem
A C version of the test case.
glibc-bz2748.patch

Description Steven Munroe 2006-06-09 22:09:57 UTC
Found this problem debug failure reported by our cluster team. If we are
PTHREAD_CANCEL_ENABLE, PTHREAD_CANCEL_DEFERRED and pthread_cancel is called
while a thread is waiting on a pthread_rwlock the cancelation will be defered
until the threads wakes up (whoever is holding the lock does
pthread_rwlock_unlock(). 

In this case the thread holding the lock is signaled, runs its cancle handler
and unlocks. One of the waiting threads wakes up and calls printf (which is a
cancellation point). We expect printf (vfprintf) to detect that cancellation is
pending and initiate cancel handling (including calling this threads cancel
handler).

Instead we that thread exiting prematurely (from start_thread,
__exit_thread_inline(0). This leave other threads hung waiting on the
pthread_rwlock and the main thread waiting on pthread_join.
Comment 1 Steven Munroe 2006-06-09 22:13:16 UTC
Created attachment 1073 [details]
C++ test case the demonstrats the problem

Compile with:

g++ -g -O0 thct_wrl2.C -lpthread -o thct_wrl2

Run with:

THCT_USE_CANCEL=1 ./thct_wrl2 4
Comment 2 Steven Munroe 2006-06-09 22:18:38 UTC
So far I have verified this failure on the recent GLIBC (cvs from 06/08/2006)
for ia32 (i586), powerpc32 and powerpc64. We dom't see the failure on X86_64. I
don't have access to other platforms at this time.

It don't see failures on any platforms for glibc-2.3.3 or glibc-2.3.4. Have not
looked at 2.3.5 or 2.3.6.
Comment 3 Steven Munroe 2006-06-09 22:27:27 UTC
Looks like we get into vfprintf which calls _pthread_cleanup_pop_restore() which
detects the defered cancellation and we fail into CANCELLATION_P (self). This
ends up calling pthread_unwind which (atleast for powerpc) ends up in the libgcc
unwind code. This where things go badly,
Comment 4 Peter Bergner 2006-06-13 21:45:37 UTC
Created attachment 1085 [details]
A C version of the test case.

Here is a C version of the test case with the problematic source extracted into
its own source file (bug.c).  Compiling bug.c with -fexceptions is all that is
needed to recreate the problem.  This does fail as a 32-bit x86 app as well as
32-bit and 64-bit ppc apps.  With this test case, you no longer need to set the
env var.

  linux% ./thct_wrl2 8

The code we're having problems with from bug.c is:

void thd_thread_2 (unsigned int ndx)
{
  pthread_cleanup_push ((void (*)(void*))thd_cleanup, &ndx);
  thread_body(ndx);
  pthread_cleanup_pop (1);
}

This test case does seem to work with older glibcs (eg, 2.3.4).
Comment 5 Jakub Jelinek 2007-03-28 17:17:55 UTC
The problem is that many functions don't have .eh_frame unwind info generated.
There are 2 ways how to solve this, one is to build the whole libc with
-fasynchronous-unwind-tables (that's e.g. what Fedora 7 is doing and what e.g.
x86_64 or s390{,x} do by default), or write a patch similar to the one I'll
attach (but while this patch handles just stuff found in the backtrace where this
was cancelled, the real patch would need to investigate what are all callers of
cancellable functions and make sure they are all not __THROW and built with
either -fexceptions of -fasynchronous-unwind-tables.
The important difference between the two is that with -fexceptions you don't get
any unwind info if e.g. all callees are __THROW, with the latter you get it
anyway.
FYI, the testcase is buggy, passing address of an automatic variable as last
pthread_create argument and dereferencing it in the thread body has undefined
behavior.
Comment 6 Jakub Jelinek 2007-03-28 17:19:29 UTC
Created attachment 1654 [details]
glibc-bz2748.patch
Comment 7 Steven Munroe 2007-03-29 15:52:48 UTC
Alan can look at this issue for PPC32/64?

Specifically for missing/incomplete CFI impacting cancel or making
-fasynchronous-unwind-tables the default for powerpc.
Comment 8 Ulrich Drepper 2008-04-08 01:17:38 UTC
I don't see this problem anymore.  Please retest and report.
Comment 9 Petr Baudis 2010-06-01 02:26:48 UTC
no response