Created attachment 10403 [details] Makes 1000 batches of 10 threads then ends them. Then prints out memory consumption before exiting. Description: glibc 2.26 now enables a per-thread cache to malloc by default. This cache consumes around 1.23 kB of memory per thread created. This cache however is not released when the thread exits. This cache can't be detected with malloc_heap() causing many/most memory profiling tools (memcheck, massif, heapcheck, etc) unable to either detect a) its existence or b) its source. For applications which create and remove lots of threads, this can quickly consume gigabytes of memory, with the source of the consumption appearing to be unknown. This issue was originally reported in 2010 at https://bugzilla.redhat.com/show_bug.cgi?id=640286 and https://bugzilla.redhat.com/show_bug.cgi?id=598498 I rereported this issue 3 days ago here: https://bugs.archlinux.org/task/55532 Additional info: Happens in: glibc 2.26-1 and glibc 2.26-2 Steps to reproduce: 1) Compile attached file. 2) Run it and wait for ~25 seconds 3) Look at VmRSS of ~127300 kB
I can confirm this issue exists. I believe I see the problem. We are leaking the tcache structure on each thread exit. I'm testing a fix. Thanks for the report.
Note that this issue has nothing to do with the originally referenced issues in comment #0. The issue here is that we don't set tcache_shutting_down early enough, and therefore it causes a reinitialization of the tcache (to a new tcache) and the about-to-be-freed old tcache structure chunk is placed back into the new tcache. After the fix: [carlos@koi swbz22111]$ ./libc_leakNEW /proc/self/status at exit VmPeak: 883040 kB VmSize: 694568 kB VmLck: 0 kB VmPin: 0 kB VmHWM: 1588 kB VmRSS: 1544 kB VmData: 43256 kB VmStk: 132 kB VmExe: 4 kB VmLib: 1944 kB VmPTE: 96 kB VmPMD: 16 kB VmSwap: 0 kB The VmRSS is now back down to 1.5MB which is expected. The fix itself is relatively straight forward: diff --git a/malloc/malloc.c b/malloc/malloc.c index 1c2a0b0..7c5e357 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2953,8 +2953,12 @@ tcache_thread_freeres (void) if (!tcache) return; + /* Disable the tcache and prevent it from being reinitialized. */ tcache = NULL; + tcache_shutting_down = 1; + /* Free all of the entries and the tcache itself back to the arena + heap for coalescing. */ for (i = 0; i < TCACHE_MAX_BINS; ++i) { while (tcache_tmp->entries[i]) @@ -2967,7 +2971,6 @@ tcache_thread_freeres (void) __libc_free (tcache_tmp); - tcache_shutting_down = 1; } text_set_element (__libc_thread_subfreeres, tcache_thread_freeres); ---
Sorry, the referenced issues gave exactly so I thought that was it. Anyways, I've compiled glibc with those two lines changed and its fixed, YAY! Should I mark this issue as resovled, or should I wait tell it gets merged or something?
(In reply to Hal Gentz from comment #3) > Sorry, the referenced issues gave exactly so I thought that was it. > > Anyways, I've compiled glibc with those two lines changed and its fixed, YAY! > > Should I mark this issue as resovled, or should I wait tell it gets merged > or something? Thanks for testing. The patch will have to be posted to the mailing list, committed, and then we can close the bug. Backporting to the 2.26 release branch is also needed.
(In reply to Florian Weimer from comment #4) > (In reply to Hal Gentz from comment #3) > > Sorry, the referenced issues gave exactly so I thought that was it. > > > > Anyways, I've compiled glibc with those two lines changed and its fixed, YAY! > > > > Should I mark this issue as resovled, or should I wait tell it gets merged > > or something? > > Thanks for testing. The patch will have to be posted to the mailing list, > committed, and then we can close the bug. Backporting to the 2.26 release > branch is also needed. I'm preparing the patch for this with a test that validates the leak.
(In reply to Carlos O'Donell from comment #5) > I'm preparing the patch for this with a test that validates the leak. cat ~/build/glibc-work/malloc/tst-malloc-tcache-leak.out INFO: 688 (bytes) are in use before starting threads. INFO: 10864 (bytes) are in use after all threads joined. OK, I got the test done somewhat portably using mallinfo and some heuristics. If you see greater than 1000x size difference from the start to the end, then you've got a leak.
Posted for review: https://www.sourceware.org/ml/libc-alpha/2017-09/msg01008.html
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 1e26d35193efbb29239c710a4c46a64708643320 (commit) from d13867625894fda6c6a5034dadfa8ff86983ea12 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1e26d35193efbb29239c710a4c46a64708643320 commit 1e26d35193efbb29239c710a4c46a64708643320 Author: Carlos O'Donell <carlos@systemhalted.org> Date: Thu Sep 28 11:05:18 2017 -0600 malloc: Fix tcache leak after thread destruction [BZ #22111] The malloc tcache added in 2.26 will leak all of the elements remaining in the cache and the cache structure itself when a thread exits. The defect is that we do not set tcache_shutting_down early enough, and the thread simply recreates the tcache and places the elements back onto a new tcache which is subsequently lost as the thread exits (unfreed memory). The fix is relatively simple, move the setting of tcache_shutting_down earlier in tcache_thread_freeres. We add a test case which uses mallinfo and some heuristics to look for unaccounted for memory usage between the start and end of a thread start/join loop. It is very reliable at detecting that there is a leak given the number of iterations. Without the fix the test will consume 122MiB of leaked memory. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 9 +++ malloc/Makefile | 3 + malloc/malloc.c | 8 ++- malloc/tst-malloc-tcache-leak.c | 112 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 129 insertions(+), 3 deletions(-) create mode 100644 malloc/tst-malloc-tcache-leak.c
Fixed for 2.27.
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, release/2.26/master has been updated via 6e1ea21501eac981204c3cc8212d45998f74983c (commit) from dd3a7239fddff81ac31373d69978d7aa1902c65f (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=6e1ea21501eac981204c3cc8212d45998f74983c commit 6e1ea21501eac981204c3cc8212d45998f74983c Author: Carlos O'Donell <carlos@systemhalted.org> Date: Thu Sep 28 11:05:18 2017 -0600 malloc: Fix tcache leak after thread destruction [BZ #22111] The malloc tcache added in 2.26 will leak all of the elements remaining in the cache and the cache structure itself when a thread exits. The defect is that we do not set tcache_shutting_down early enough, and the thread simply recreates the tcache and places the elements back onto a new tcache which is subsequently lost as the thread exits (unfreed memory). The fix is relatively simple, move the setting of tcache_shutting_down earlier in tcache_thread_freeres. We add a test case which uses mallinfo and some heuristics to look for unaccounted for memory usage between the start and end of a thread start/join loop. It is very reliable at detecting that there is a leak given the number of iterations. Without the fix the test will consume 122MiB of leaked memory. (cherry picked from commit 1e26d35193efbb29239c710a4c46a64708643320) ----------------------------------------------------------------------- Summary of changes: ChangeLog | 9 +++ malloc/Makefile | 3 + malloc/malloc.c | 8 ++- malloc/tst-malloc-tcache-leak.c | 112 +++++++++++++++++++++++++++++++++++++++ 4 files changed, 129 insertions(+), 3 deletions(-) create mode 100644 malloc/tst-malloc-tcache-leak.c
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, release/2.26/master has been updated via d5c6dea2d5b4b5c64625c5386f6baec7bf2d89b3 (commit) from 6e1ea21501eac981204c3cc8212d45998f74983c (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=d5c6dea2d5b4b5c64625c5386f6baec7bf2d89b3 commit d5c6dea2d5b4b5c64625c5386f6baec7bf2d89b3 Author: Carlos O'Donell <carlos@systemhalted.org> Date: Fri Oct 6 13:31:05 2017 -0700 Update NEWS for bug 22111. ----------------------------------------------------------------------- Summary of changes: NEWS | 1 + 1 files changed, 1 insertions(+), 0 deletions(-)
Hi guys, It seems this this issue is still exist in some cases, for example i can repro it with this test in glibc 2.29: #include <pthread.h> void *noop() { pthread_detach(pthread_self()); return NULL; } void main() { pthread_t id; while(1) pthread_create(&id, NULL, noop, 0); } There's also another report at: https://sourceware.org/ml/glibc-bugs/2018-01/msg00171.html So it looks like the tcache_shutting_down is still not early enough, when we detach a thread with no tcache allocated(like the above test), the tcache_shutting_down would still be false: static void tcache_thread_shutdown (void) { int i; tcache_perthread_struct *tcache_tmp = tcache; if (!tcache) return; <--- returned here /* Disable the tcache and prevent it from being reinitialized. */ tcache = NULL; tcache_shutting_down = true; <--- unreachable And the tcache might be reinitialized later(breakpoint at later MAYBE_INIT_TCACHE): Thread 2 (Thread 0x7f993c61e0 (LWP 8920)): #0 0x0000007f9d444f7c in free () from /lib/libc.so.6 #1 0x0000007f9d57a3e4 in _dl_deallocate_tls () from /lib/ld-linux-aarch64.so.1 #2 0x0000007f9d541490 in free_stacks () from /lib/libpthread.so.0 #3 0x0000007f9d541a4c in __deallocate_stack () from /lib/libpthread.so.0 #4 0x0000007f9d542984 in start_thread () from /lib/libpthread.so.0 #5 0x0000007f9d49b1fc in thread_start () from /lib/libc.so.6 This works: +++ b/malloc/malloc.c @@ -2958,13 +2958,14 @@ tcache_thread_shutdown (void) { int i; tcache_perthread_struct *tcache_tmp = tcache; + tcache_shutting_down = true; if (!tcache) return; /* Disable the tcache and prevent it from being reinitialized. */ tcache = NULL; - tcache_shutting_down = true;
(In reply to JeffyChen from comment #12) > Hi guys, > > It seems this this issue is still exist in some cases, for example i can > repro it with this test in glibc 2.29: Thanks for the report, this is now bug 28028.