Bug 15321 - malloc/free can't give the memory back to kernel when main_arena is discontinous
Summary: malloc/free can't give the memory back to kernel when main_arena is discontinous
Status: REOPENED
Alias: None
Product: glibc
Classification: Unclassified
Component: malloc (show other bugs)
Version: 2.17
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
: 24422 (view as bug list)
Depends on:
Blocks:
 
Reported: 2013-03-29 09:30 UTC by Zhanxin.xu
Modified: 2019-08-19 20:42 UTC (History)
7 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
testcase for free can't give the memory back to kernel (548 bytes, application/octet-stream)
2013-03-29 09:30 UTC, Zhanxin.xu
Details
new testcase (582 bytes, text/x-csrc)
2013-03-29 17:24 UTC, OndrejBilka
Details
test_case to show the peak memory usage retain after memory all freed(use write to avoid mallocs in printf) (529 bytes, text/x-csrc)
2017-04-21 07:18 UTC, ma.jiang
Details
fix the discontinuous main arena (3.41 KB, patch)
2018-11-28 09:02 UTC, hehongjun
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Zhanxin.xu 2013-03-29 09:30:36 UTC
Created attachment 6957 [details]
testcase for free can't give the memory back to kernel

Compile and run the attached test-case.

What it does:
1.Call malloc() to allocate 1024*4kB memory 
2.Call mmap()
3.Call malloc() and allocate 4096kB memory again
4.Call system to inspect the MemFree
5.Call free()
6.Call system to inspect the MemFree again
 
So,The will be free 8192kB memory,but it not.
-------------------------------------------------------------  
[xuzhanxin@localhost t-arena]$ gcc -o test-ok testcase.c -g -DUNMAP_OK
[xuzhanxin@localhost t-arena]$ gcc -o test-err testcase.c -g 

[xuzhanxin@localhost t-arena]$ ./test-ok 
	First malloc ok
	mmap ok
	Second malloc ok!
MemFree:      12630976 kB
	After free memory!
MemFree:      12638540 kB

[xuzhanxin@localhost t-arena]$ ./test-err
	First malloc ok
	mmap ok
	Second malloc ok!
MemFree:      12627628 kB
	After free memory!
MemFree:      12627380 kB
Comment 1 OndrejBilka 2013-03-29 17:24:49 UTC
Created attachment 6958 [details]
new testcase

Could you make it to leak memory repeately?

I enclosed your test in loop and memory consumption remains stable.
Comment 2 Zhanxin.xu 2013-04-02 09:53:29 UTC
(In reply to comment #1)
> Created attachment 6958 [details]
> new testcase
> Could you make it to leak memory repeately?
> I enclosed your test in loop and memory consumption remains stable.

The memory(8k) can be used by itself,so memory exhausted not occur in loop.

[@localhost t-arena]$ cat /proc/22652/smaps |grep Rss
Rss:                 88 kB
Rss:                  4 kB
Rss:                  4 kB
Rss:                192 kB
Rss:                  0 kB
Rss:                  8 kB
Rss:                  4 kB
Rss:                 12 kB
Rss:                  4 kB
Rss:                  4 kB
Rss:                  4 kB
Rss:               5028 kB   /* can't free */
Rss:                  0 kB
Rss:               3208 kB   /* can't free */
Rss:                  4 kB
Rss:                  8 kB
Comment 3 Florian Weimer 2013-04-02 11:06:32 UTC
(In reply to comment #0)
> What it does:
> 1.Call malloc() to allocate 1024*4kB memory 
> 2.Call mmap()
> 3.Call malloc() and allocate 4096kB memory again
> 4.Call system to inspect the MemFree
> 5.Call free()
> 6.Call system to inspect the MemFree again

I'm not sure this test case is valid.  Without compaction (which tends to break C semantics), you can always find a sequence of allocations that result in fragmentation.  As Ondrej said, as long as the test reaches a steady state when executed in a loop, this is not a bug.
Comment 4 Siddhesh Poyarekar 2013-04-16 14:39:19 UTC
Closing as NOTABUG based on comment 3.  malloc/free indeed cannot give address space back to the kernel when main arena is discontinuous and that is expected behaviour.
Comment 5 ma.jiang 2017-04-20 07:15:21 UTC
(In reply to Siddhesh Poyarekar from comment #4)
> Closing as NOTABUG based on comment 3.  malloc/free indeed cannot give
> address space back to the kernel when main arena is discontinuous and that
> is expected behaviour.

Hi guys, 
  I'm porting a patch about this problem to newest glibc. I am afraid this bug were probably more serious than discussed here.
  In short, glibc just do not free any memory in the main heap when it become discoutinuous. As a result, a process may keep the peak physical memory usage as long as it is alive. This is really bad as glibc are widely used in server programs...
  This bug is not about fragmentation, but more serious. Once all memory are freed, fragmentation will disappear and all memory could be reused again. But if the bug occurred, no memory could be reused(by other processes) even all of them were freed.
  In fact, fix to this bug is quite straightforward. Just port the mechanism used by sub heaps to the main arena should  be enough. We have made such a patch several years ago, and have applied it on production environments since then. I'll post the patch later after I port it to the new glibc release.
Comment 6 Siddhesh Poyarekar 2017-04-20 12:17:39 UTC
Please reopen if you can share a test case that demonstrates that memory is not reclaimed if it has been fully freed.
Comment 7 ma.jiang 2017-04-21 07:18:33 UTC
Created attachment 10003 [details]
test_case to  show the peak memory usage  retain after memory all freed(use write to avoid mallocs in printf)

Hi all,
  I have post a test to show the most dangerous impact of the bug(please make sure you have more than 4G memory). In short, a process may keep the peak physical memory usage when the bug occurred. 
  Imagine there were three daemon servers on a 8G-memory machine,  the three guys work for on same data pipeline to provide useful information.  Assume each of them use up to 4G memory,  we know they will run happily as they were designed to work serially. However,  if this bug occurred unfortunately , one of the three could not get enough memory, which lead to OOM or unacceptably slow swap usage.
Comment 8 ma.jiang 2017-04-26 05:55:01 UTC
Comment on attachment 10003 [details]
test_case to  show the peak memory usage  retain after memory all freed(use write to avoid mallocs in printf)

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <sys/mman.h>
#include <assert.h>
#define MEM_SIZE 1024*512
char print_buf[2048];
const char msg1[1024] = "now we will see the peak physical memory usage\n";
const char msg2[1024] = "I have freed all memory, now I'm using \n";
int main(int argc,char *argv[])
{	

	int i = 0;	
	void **mem = (void**) malloc(sizeof(void*) * MEM_SIZE * 2);; 
	
	for (; i < MEM_SIZE;i++)
	{
	  mem[i]=malloc(4096);
	  if(mem[i] == NULL) 
	  {
	    printf("malloc failed.\n");
	    exit(1);
	  }
	}
	
	void *tt = mmap(mem[i-1]+1024*1024, 1024*213, PROT_READ | PROT_WRITE,MAP_SHARED | MAP_ANONYMOUS, -1, 0);

	for (; i < MEM_SIZE * 2; i++)
  	{
     	  mem[i]=malloc(4096);
          if(mem[i] == NULL)
          {
            printf("malloc failed.\n");
            exit(1);
          } 
 	}

	char cmd[1024];
	write(1, msg1, strlen(msg1));
	snprintf(cmd, sizeof(cmd), "cat /proc/%d/status |grep VmRSS:", getpid());
	system(cmd);
	
	while(i--)
	{
	  free(mem[i]);
	  mem[i]=NULL;	
	}
	free(mem);

	//assert(malloc_trim(0)==1);
        write(1, msg2, strlen(msg2));
	system(cmd);

	return 0;
}
Comment 9 Carlos O'Donell 2017-04-27 13:56:26 UTC
(In reply to ma.jiang from comment #8)
> 	void *tt = mmap(mem[i-1]+1024*1024, 1024*213, PROT_READ |
> PROT_WRITE,MAP_SHARED | MAP_ANONYMOUS, -1, 0);

This corrupts the arena and the chunk metadata by writing zero (if the kernel honours the address hint). You can't place a mapping at a fixed address that may already contain data. You must know that address is empty or you're going to have serious problems, possibly overwriting whatever is already at that address.

Corruption of metadata my indeed result in all memory not being freed.
Comment 10 Florian Weimer 2017-04-27 14:03:10 UTC
(In reply to Carlos O'Donell from comment #9)
> (In reply to ma.jiang from comment #8)
> > 	void *tt = mmap(mem[i-1]+1024*1024, 1024*213, PROT_READ |
> > PROT_WRITE,MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> 
> This corrupts the arena and the chunk metadata by writing zero (if the
> kernel honours the address hint).


If it does, it's a kernel bug.  If the break point overlaps with the mapping, the mmap call must fail because MAP_FIXED hasn't been specified.

Even without the hint, something like that could happen under virtual address space pressure.

I think this needs further analysis.
Comment 11 Carlos O'Donell 2017-04-27 14:40:14 UTC
(In reply to Florian Weimer from comment #10)
> (In reply to Carlos O'Donell from comment #9)
> > (In reply to ma.jiang from comment #8)
> > > 	void *tt = mmap(mem[i-1]+1024*1024, 1024*213, PROT_READ |
> > > PROT_WRITE,MAP_SHARED | MAP_ANONYMOUS, -1, 0);
> > 
> > This corrupts the arena and the chunk metadata by writing zero (if the
> > kernel honours the address hint).
> 
> 
> If it does, it's a kernel bug.  If the break point overlaps with the
> mapping, the mmap call must fail because MAP_FIXED hasn't been specified.

You are absolutely right, I missed that MAP_FIXED wasn't in the flags.

> Even without the hint, something like that could happen under virtual
> address space pressure.
> 
> I think this needs further analysis.

We already support foreign sbrk in the main arena, thus we support non-contiguous cases.

This looks like one of those such cases, but instead of a foreign sbrk, we have a foreign mmap.

I'm reopening and looking at it under the debugger to see what really changes if anything.
Comment 12 ma.jiang 2017-04-28 01:53:43 UTC
> 
> We already support foreign sbrk in the main arena, thus we support
> non-contiguous cases.
> 
> This looks like one of those such cases, but instead of a foreign sbrk, we
> have a foreign mmap.
> 
> I'm reopening and looking at it under the debugger to see what really
> changes if anything.
Hi all,
  It's happy to see your replies. The mmap in the test code seems confusing, but we are just trying to make a fence in front of brk(to make the main arena discoutinuous). This action can be replaced by a normal mmap(which is normal in user progarms), providing result address of the mmap is in front of brk.
  I have ported our local patch to glibc-2.24(and it works as expected :) ). I'll post it later if you were intereted.
Comment 13 Carlos O'Donell 2017-04-28 02:38:53 UTC
(In reply to ma.jiang from comment #12)
> > 
> > We already support foreign sbrk in the main arena, thus we support
> > non-contiguous cases.
> > 
> > This looks like one of those such cases, but instead of a foreign sbrk, we
> > have a foreign mmap.
> > 
> > I'm reopening and looking at it under the debugger to see what really
> > changes if anything.
> Hi all,
>   It's happy to see your replies. The mmap in the test code seems confusing,
> but we are just trying to make a fence in front of brk(to make the main
> arena discoutinuous). This action can be replaced by a normal mmap(which is
> normal in user progarms), providing result address of the mmap is in front
> of brk.
>   I have ported our local patch to glibc-2.24(and it works as expected :) ).
> I'll post it later if you were intereted.

If you are going to post patches please follow the contribution checklist, particularly the copyright section:
https://sourceware.org/glibc/wiki/Contribution%20checklist

I can confirm that sbrk will fail when it runs across the mmap'd region, and malloc will fall back to using mmap as MORECORE and set the main_arena as non-contiguous.

The question which remains is to double check the trimming logic to make sure it could be capable of undoing that transition to start freeing back to the original sbrk region.

I don't think it can because this code triggers:

  /*
     Only proceed if end of memory is where we last set it.
     This avoids problems if there were foreign sbrk calls.
   */
  current_brk = (char *) (MORECORE (0));
  if (current_brk == (char *) (av->top) + top_size)

The discontinuity is as-if a foreign sbrk had been applied, and that means we won't free any memory.

If mmap's keep intersecting the heap of the main_arena, we will see continued increases in RSS that we can't free without calling malloc_trim() which trims not just from the top but from all coalesced chunks everywhere.
Comment 14 Carlos O'Donell 2017-04-28 02:41:41 UTC
Thinking more deeply about this the only way we are going to solve this is by having main_arena gain a full heap_info, and when sbrk fails, allocate a second heap with mmap. The heap_info structure is what we need to track and trim top-of-heap properly. So we will have moved main_arena closer to what the other arenas are like with the capability to have multiple heaps. Though in the case of main_arena it would only happen when the traditional heap fails.
Comment 15 Carlos O'Donell 2017-04-28 02:43:00 UTC
(In reply to Carlos O'Donell from comment #14)
> Thinking more deeply about this the only way we are going to solve this is
> by having main_arena gain a full heap_info, and when sbrk fails, allocate a
> second heap with mmap. The heap_info structure is what we need to track and
> trim top-of-heap properly. So we will have moved main_arena closer to what
> the other arenas are like with the capability to have multiple heaps. Though
> in the case of main_arena it would only happen when the traditional heap
> fails.

So to be clear, this is the kind of patch I'd be looking for if someone were going to work on a solution for this.

I'm not going to tackle this right now.
Comment 16 ma.jiang 2017-04-28 03:14:19 UTC
(In reply to Carlos O'Donell from comment #15)
> (In reply to Carlos O'Donell from comment #14)
> > Thinking more deeply about this the only way we are going to solve this is
> > by having main_arena gain a full heap_info, and when sbrk fails, allocate a
> > second heap with mmap. The heap_info structure is what we need to track and
> > trim top-of-heap properly. So we will have moved main_arena closer to what
> > the other arenas are like with the capability to have multiple heaps. Though
> > in the case of main_arena it would only happen when the traditional heap
> > fails.
> 
> So to be clear, this is the kind of patch I'd be looking for if someone were
> going to work on a solution for this.
> 
> I'm not going to tackle this right now.
  Thanks, our solution are the same as you described above. I'll try to post it as soon as possible.
Comment 17 hehongjun 2018-11-28 09:02:33 UTC
Created attachment 11415 [details]
fix the discontinuous main arena
Comment 18 Adhemerval Zanella 2019-08-19 20:42:02 UTC
*** Bug 24422 has been marked as a duplicate of this bug. ***