Created attachment 8914 [details]
pthread_test.c sample code
For non-root user, when changes priority of thread, pthread_create will fail, and there is 8MB stack memory leak;
For root user, there is no such issue;
2. sample code
gcc pthread_test.c -lpthread
use SHOW_VMSIZE, ReadProcStatusAndGetFieldAsSizeT to show the VmSize usage at run time;
When run it with non-root user, VmSize will keep increasing
(you could run it several times to check the result, or you could change the loop to 10000 to check it)
When run it with root user, VmSize is stable
4. tested glibc
glibc 2.21, 2.22
$ uname -a
3.19.0-43-generic #49-Ubuntu SMP Sun Dec 27 19:43:07 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/issue
Created attachment 8915 [details]
Created attachment 8916 [details]
What happens is that the thread created “stopped” (so that the priorities can be applied before the thread function is started), but we fail to take into account that the thread could be canceled in the stopped phase. Then the thread will never be joined, so its thread stack is never flagged for re-use.
A fix could perhaps look like this, but the existing cancellation handling looks racy: What happens if SIGCANCEL arrives before the handler is set up? Can SIGCANCEL be lost?
diff --git a/nptl/pthread_create.c b/nptl/pthread_create.c
index 5216041..69e5bc6 100644
@@ -308,6 +308,7 @@ START_THREAD_DEFN
unwind_buf.priv.data.cleanup = NULL;
+ bool start_routine_called = false;
not_first_call = setjmp ((struct __jmp_buf_tag *) unwind_buf.cancel_jmp_buf);
if (__glibc_likely (! not_first_call))
@@ -329,6 +330,7 @@ START_THREAD_DEFN
LIBC_PROBE (pthread_start, 3, (pthread_t) pd, pd->start_routine, pd->arg);
/* Run the code the user provided. */
+ start_routine_called = true;
THREAD_SETMEM (pd, result, CALL_THREAD_FCT (pd));
@@ -435,7 +437,7 @@ START_THREAD_DEFN
__madvise (pd->stackblock, freesize - PTHREAD_STACK_MIN, MADV_DONTNEED);
/* If the thread is detached free the TCB. */
- if (IS_DETACHED (pd))
+ if (IS_DETACHED (pd) || !start_routine_called)
/* Free the TCB. */
else if (__glibc_unlikely (pd->cancelhandling & SETXID_BITMASK))
SIGCANCEL cannot be lost. The parent in __pthread_initialize_minimal_internal will setup the handler long before any pthread_create code is run.
I think the simplest solution is actually to make the parent responsible for the cleanup since it is the owner of the PD after create_thread returns an error.
So the parent should run pthread_join on PD to reap the stack.
682 if (__glibc_unlikely (retval != 0))
684 /* If thread creation "failed", that might mean that the thread got
685 created and ran a little--short of running user code--but then
686 create_thread cancelled it. In that case, the thread will do all
687 its own cleanup just like a normal thread exit after a successful
688 creation would do. */
690 if (thread_ran)
691 assert (pd->stopped_start);
If thread_ran, and !detached, then stopped_start == true, so the user code can't have run and detached itself, so we are still the owner of PD and can issue a pthread_join to reap the thread.
I discovered this bug, and then this issue while auditing the code for bug 20116.
Fixed on 2.34 (02189e8fb00c3c7f4e67476e21011a22c5dee707).