Bug 19951 - Use after free in pthread_detach
Summary: Use after free in pthread_detach
Status: NEW
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.25
: P2 critical
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2016-04-15 09:43 UTC by Pavel Labath
Modified: 2017-01-17 07:08 UTC (History)
4 users (show)

See Also:
Last reconfirmed:

Source file reproducing the bug. (201 bytes, text/x-csrc)
2016-04-15 09:43 UTC, Pavel Labath

Note You need to log in before you can comment on or make changes to this bug.
Description Pavel Labath 2016-04-15 09:43:43 UTC
Created attachment 9195 [details]
Source file reproducing the bug.

Summary: A race in pthread_detach can trigger a read from deallocated memory (=>SEGV) or can corrupt the state of another thread, if that memory has been reused.

When a detached thread exits, it's descriptor and stack does not get freed immediately, but it gets put into a cache. Memory from this cache can later be reused to create new threads, or freed (via munmap) if the cache gets too big. However, it is possible for this reuse/unmap to happen before the actual pthread_detach call returns, while it's still accessing the memory via the descriptor of the now-exited thread.

I attach a small test file (a.c) which demonstrates problem. Note that I am running the program under gdb, but I am doing this only to control the relative timings of individual threads, I am messing in no way with the internal state of the library.

$ gdb ./a.out 
(gdb) set non-stop on
(gdb) dir /etc/apt/eglibc-2.19/nptl
Source directories searched: /etc/apt/eglibc-2.19/nptl:$cdir:$cwd
(gdb) b 21
Breakpoint 1 at 0x4008d2: file a.c, line 21.
(gdb) b start
Breakpoint 2 at 0x4007c5: file a.c, line 6.
(gdb) r
Starting program: /tmp/a.out 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/usr/grte/v4/lib64/libthread_db.so.1".
[New Thread 0x7ffff77f6700 (LWP 52987)]
[New Thread 0x7ffff45f5700 (LWP 52988)]

Breakpoint 2, start (arg=0x0) at a.c:6
6           return 0;
Breakpoint 1, main () at a.c:21
21          assert(pthread_detach(handle2) == 0);

Breakpoint 2, start (arg=0x0) at a.c:6
6           return 0;
Main thread has created two new threads (which we have stopped in the start function). Thread 2 has already been detached, and we are now about to detach Thread 3.
pthread_detach (th=140737293276928) at pthread_detach.c:31
31        if (INVALID_NOT_TERMINATED_TD_P (pd))
(gdb) n
38        if (atomic_compare_and_exchange_bool_acq (&pd->joinid, pd, NULL))
50          if ((pd->cancelhandling & EXITING_BITMASK) != 0)
(gdb) p pd
$1 = (struct pthread *) 0x7ffff45f5700
The main thread is now in pthread_detach. It has already marked the thread as "detached", but it will still access it's memory: pd->cancelhandling and __free_tcb(pd). Before we let it do that, we will let other threads complete.
(gdb) info th
  Id   Target Id         Frame 
  3    Thread 0x7ffff45f5700 (LWP 52988) "a.out" start (arg=0x0) at a.c:6
  2    Thread 0x7ffff77f6700 (LWP 52987) "a.out" start (arg=0x0) at a.c:6
* 1    Thread 0x7ffff7fd8740 (LWP 52983) "a.out" pthread_detach (th=140737293276928)
    at pthread_detach.c:50
(gdb) thread 3
[Switching to thread 3 (Thread 0x7ffff45f5700 (LWP 52988))]
#0  start (arg=0x0) at a.c:6
6           return 0;
(gdb) c
[Thread 0x7ffff45f5700 (LWP 52988) exited]
No unwaited-for children left.
Thread 3 has exited, it's memory has been put into the stack_cache (allocatestack.c).
(gdb) thread 2
[Switching to thread 2 (Thread 0x7ffff77f6700 (LWP 52987))]
#0  start (arg=0x0) at a.c:6
6           return 0;
(gdb) c
[Thread 0x7ffff77f6700 (LWP 52987) exited]
No unwaited-for children left.
Thread 2 has exited as well. It's exit has triggered a purge of the cache, which unmapped the memory used by Thread 3. Now we let the main thread finish, which will trigger a SIGSEGV. If that memory had been reallocated, or if pthread had reused the cache entry for another thread, it could mess with random memory.
(gdb) thread 1
[Switching to thread 1 (Thread 0x7ffff7fd8740 (LWP 52983))]
#0  pthread_detach (th=140737293276928) at pthread_detach.c:50
50          if ((pd->cancelhandling & EXITING_BITMASK) != 0)
(gdb) c

Program received signal SIGSEGV, Segmentation fault.
pthread_detach (th=140737293276928) at pthread_detach.c:50
50          if ((pd->cancelhandling & EXITING_BITMASK) != 0)
(gdb) q

Note that even though I have reproduced this bug in gdb, I have observed this happening in the wild (which led me to start investigating).
Comment 1 Carlos O'Donell 2017-01-17 07:08:45 UTC
I can confirm this issue.

Fundamentally this code in pthread_detach is flawed:

 36   /* Mark the thread as detached.  */
 37   if (atomic_compare_and_exchange_bool_acq (&pd->joinid, pd, NULL))
 38     {
 39       /* There are two possibilities here.  First, the thread might
 40          already be detached.  In this case we return EINVAL.
 41          Otherwise there might already be a waiter.  The standard does
 42          not mention what happens in this case.  */
 43       if (IS_DETACHED (pd))
 44         result = EINVAL;
 45     }
 46   else
 47     /* Check whether the thread terminated meanwhile.  In this case we
 48        will just free the TCB.  */
 49     if ((pd->cancelhandling & EXITING_BITMASK) != 0)
 50       /* Note that the code in __free_tcb makes sure each thread
 51          control block is freed only once.  */
 52       __free_tcb (pd);

Once ownership of PD is released on line 37 we may never be touched again.

There is a resource leak that we can't prevent in the current implementation.


(a) Check if I'm detached.
(b) If detached then free resources.
(c) Exit.

Any thread T2 may make T1 detached after (a) and create a scenario where T2 doesn't know if T1 was detached before (a) or after (a) and can't check without risk of segfault if PD is unmapped.

The detach sequence needs to be rewritten such that (a) is done atomically and is not just a check but writes information back into PD to indicate to T2 that it has already shut down far enough that it will not be freeing it's own resources. In that case T2 can, in pthread_detach, carry out the free of the resources, knowing PD is still around.

The only immediate workaround I can suggest is to start the thread detached rather than trying to set the detached status at a later point in time.

I do not expect this to get fixed in 2.25 (Feb 1st 2017).