Overview: After filling up the sbrk-allocated space, each call to free() incurs a system call to brk(), thus consuming unnecessary CPU time. Further Details: This thread from libc-hacker: "Malloc not trimming sbrk space once it fills up" http://sourceware.org/ml/libc-hacker/2004-01/msg00012.html describes how malloc doesn't trim sbrk space once it has started using an mmap-ed region. In a program I'm developing that does hundreds of thousands of malloc and free per second, I noticed a sudden increase in CPU usage and system calls once this threshold had been reached. It turns out that every free() operation ends up invoking the brk() system call to find out what the current limit is. When this is compared to the current end-of-memory as known by the malloc library at line 3208 of malloc.c, in function sYSTRIm(), no trimming is performed so the brk() system call (via morecore) is called again at next free(). Relevant lines from malloc.c: /* Only proceed if end of memory is where we last set it. This avoids problems if there were foreign sbrk calls. */ current_brk = (char*)(MORECORE(0)); if (current_brk == (char*)(av->top) + top_size) { // malloc.c:3208 I modified Jakub Jelinek's tst-malloc-trim.c from his original email linked above to demonstrate the problem. The output from the program contains "count %d" showing the number of times morecore was called when freeing blocks. This should ideally be small. As it is, it tends to be close to the number of blocks freed. For example: Actual Result: "free did not trim sbrk area at all: midsum 540672 sum 540672 count 450" Expected Result: "free did not trim sbrk area at all: midsum 540672 sum 540672 count 0" I'm currently doing 400,000 frees per second, which adds a non-trivial overhead from the 400,000 brk () system calls. Workaround: I can workaround this by disabling trimming (since it doesn't work anyway) through mallopt (M_TRIM_THRESHOLD,-1). Systems Tested: Confirmed with glibc-2.3.2 from debian 3.0: GNU C Library stable release version 2.3.2, by Roland McGrath et al. Copyright (C) 2003 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 3.3.5 (Debian 1:3.3.5-12). Compiled on a Linux 2.6.0-test7 system on 2005-05-10. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others linuxthreads-0.10 by Xavier Leroy BIND-8.2.3-T5B libthread_db work sponsored by Alpha Processor Inc NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Thread-local storage support included. Also with self-compiled glibc-2.3.5 on debian 3.0: GNU C Library stable release version 2.3.5, by Roland McGrath et al. Copyright (C) 2005 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. Compiled by GNU CC version 3.3.5 (Debian 1:3.3.5-13). Compiled on a Linux 2.6.0-test7 system on 2005-07-27. Available extensions: GNU libio by Per Bothner crypt add-on version 2.1 by Michael Glad and others Native POSIX Threads Library by Ulrich Drepper et al BIND-8.2.3-T5B NIS(YP)/NIS+ NSS modules 0.19 by Thorsten Kukuk Thread-local storage support included. test-malloc-trim2.c ==SNIP== /* Copyright (C) 2004 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Jakub Jelinek <jakub@redhat.com>, 2004. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with the GNU C Library; if not, write to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA. */ #include <malloc.h> #include <stdlib.h> #include <stdio.h> #include <sys/mman.h> #include <unistd.h> ptrdiff_t sum; int count; __malloc_ptr_t counting_morecore (ptrdiff_t size) { sum += size; count++; return __default_morecore (size); } int main (void) { size_t pagesize = sysconf (_SC_PAGESIZE); char *p = sbrk (0); p += (128 * pagesize) & ~(pagesize - 1); if (mmap (p, pagesize, PROT_NONE, MAP_PRIVATE|MAP_ANON, -1, 0) != p) { puts ("Couldn't test"); return 0; } if (mallopt (M_TRIM_THRESHOLD, 32 * pagesize) == 0) { puts ("Couldn't test"); return 0; } // uncomment to enable work-around //mallopt(M_TRIM_THRESHOLD, (unsigned long) -1); __morecore = counting_morecore; char *array[512]; int i; for (i = 0; i < 512; ++i) array[i] = malloc (pagesize / 4); ptrdiff_t midsum = sum; count = 0; for (i = 511; i >= 0; --i) free (array[i]); __morecore = __default_morecore; if (midsum == sum && midsum > 32 * pagesize) { printf ("free did not trim sbrk area at all: midsum %td sum %td count %td\n", midsum, sum, count); return 1; } return 0; } ==SNIP==
This bug can be addressed with a trivial one liner patch, which is not even in C code proper but a preprocessing directive! ROFL!!! I verified this to work on a MIPS system based on kernel 2.6.17.7 and glibc 2.5. The system boots just fine, everything seems to run, and my malloc test case shows that 1) memory is being returned to the OS by free even after sbrk fails, and 2) sbrk is not repeatedly called after it fails. (The test case simply allocates about 500 megs worth of memory in 500 byte pieces). Malloc nicely switches to an alternate arena, whereby it uses mmap to grab heaps. These heaps are nicely freed when they become deallocated, and the memory originally obtained from sbrk is nicely returned with a negative sbrk. I ran it under gdb, placing breakpoints on mmap, munap, and sbrk, and also watching the virtual memory size of the process using ps aux. --- glibc-2.5.orig/malloc/malloc.c 2006-09-07 09:06:02.000000000 -0700 +++ glibc-2.5/malloc/malloc.c 2008-04-04 01:01:32.862714704 -0800 @@ -3056,15 +3056,24 @@ (*__after_morecore_hook) (); } else { /* - If have mmap, try using it as a backup when MORECORE fails or - cannot be used. This is worth doing on systems that have "holes" in + If we have arenas, then we are basically done with this + main_arena. We can return null, because the higher level malloc + routine like public_mALLOc will try fetching from another + arena, creating one if necessary. This way we won't + cause the main_arena to become non-contiguous. + A non-contiguous main_arena has the bug that it won't return + memory to the OS when it is freed, and that it keeps + hammering away on a nonworking sbrk for each allocation request + + If have mmap, but not arenas, try using it as a backup when MORECORE fails + or cannot be used. This is worth doing on systems that have "holes" in address space, so sbrk cannot extend to give contiguous space, but space is available elsewhere. Note that we ignore mmap max count and threshold limits, since the space will not be used as a segregated mmap region. */ -#if HAVE_MMAP +#if HAVE_MMAP && !USE_ARENAS /* Cannot merge with old top, so add its size back in */ if (contiguous(av)) size = (size + old_size + pagemask) & ~pagemask;
That would fix the test case in question, but only because it prohibits allocation from the main arena altogether in this specific case. I think it would not fix the following (IMHO in practice more likely) situation: 1. lot's of malloc/free from the main arena 2. a single sbrk(+something) call from the app, making the main arena discontiguous after just one additional malloc() 3. lot's of free() from the main arena -- now every free() can still incur an sbrk(0) call because the end of the main arena never can pass the user-allocated space I think the proper fix would be to prevent or alleviate the sbrk(0) in every free(), given that sbrk(0) is very expensive (but see below). Maybe we should give up expecting a discontiguous main_arena to magically become contiguous again, i.e. not call trim at all when the arena is marked noncontiguous. BUT, taking a step back, is this really a serious problem? I consider the statement that "each call to free() incurs a system call" _false_, because libc caches the last brk value, so sbrk(0) does _not_ perform a system call AFAICS. Or are we talking about specific systems where sbrk(0) is just like sbrk(+x)?
(In reply to comment #2) > I think it would not fix the following (IMHO in practice more likely) situation: > 1. lot's of malloc/free from the main arena > 2. a single sbrk(+something) call from the app IMHO, apps that call sbrk deserve less than optimal behavior from malloc, such as memory not being returned when it is freed. > 3. lot's of free() from the main arena > -- now every free() can still incur an > sbrk(0) call because the end of the main arena never can pass the user- allocated > space > I think the proper fix would be to prevent or alleviate the sbrk(0) in every > free(), given that sbrk(0) is very expensive (but see below). But, seeing below, it is successfully argued that sbrk(0) is actually not expensive at all thanks to cashing. The remaining problem is that the freed memory won't be returned to the operating system. But the application arguably deserves that, because it interfered with malloc by calling sbrk (+delta).
I reported this bug about three years ago now, so my memory is a little bit hazy. However, I do remember that calls to free() were definitely causing expensive syscalls. I note the recent comments about sbrk(0) not being expensive due to caching and, reviewing the glibc sources, I have to agree with this. So I suspect there may be some detail missing from my original report that explains more what was going on. Sorry!
Created attachment 4242 [details] test case Is there any real solution for this problem? The "mallopt(M_TRIM_THRESHOLD, (unsigned long) -1)" workaround does not work for me. It just causes every malloc/free to result in a mmap/munmap syscall, which is even slower. I have attached a small test program which can (kind of) work around the problem by placing an explicit 'block buffer' behind the work buffers (set 'avoid_intelligence = true'). This would be hard to achieve in a real-world program, though. system info: glibc 2.10.1-4 $ uname -a Linux myhost 2.6.31-bfs #1 SMP PREEMPT Mon Sep 28 14:26:47 UTC 2009 x86_64 Intel(R) Core(TM)2 Duo CPU P9400 @ 2.40GHz GenuineIntel GNU/Linux The strace output on the test program is: brk(0xcab000) = 0xcab000 brk(0xc2b000) = 0xc2b000 brk(0xc6b000) = 0xc6b000 brk(0xcab000) = 0xcab000 (... continued ...)
oops, misinterpreted the topic of this bug: my previous test case is for the Problem 'Malloc trimming filled-up sbrk space too aggressively causes unnecessary system call overhead' It is still a serious problem (caused about 50% slow down in a real world program. The only possible work around was to introduce much more complicated memory management in my application)
In reply to comment #5: The test program you've posted does potentially cause lots of system calls, but for a different reason than the original report: You're allocating 256k blocks which exceed the default mmap threshold. So, to avoid any mmapping in that kind of app you have to specify _both_ mallopt(M_TRIM_THRESHOLD, (unsigned long) -1); mallopt(M_MMAP_THRESHOLD, (unsigned long) -1);
Does following patch fixes performance for you? I want to be sure that problem was in sbrk but not in something else. diff --git a/malloc/malloc.c b/malloc/malloc.c index 2938234..ab6d46f 100644 --- a/malloc/malloc.c +++ b/malloc/malloc.c @@ -2716,6 +2716,10 @@ static int systrim(size_t pad, mstate av) char* current_brk; /* address returned by pre-check sbrk call */ char* new_brk; /* address returned by post-check sbrk call */ size_t pagesz; + static int give_up; + + if (give_up) + return 0; pagesz = GLRO(dl_pagesize); top_size = chunksize(av->top); @@ -2762,7 +2766,8 @@ static int systrim(size_t pad, mstate av) return 1; } } - } + } else + give_up = 1; } return 0; }
*** Bug 17195 has been marked as a duplicate of this bug. ***