Bug 24422 - Malloc/free does not give memory back to the OS
Summary: Malloc/free does not give memory back to the OS
Status: RESOLVED DUPLICATE of bug 15321
Alias: None
Product: glibc
Classification: Unclassified
Component: malloc (show other bugs)
Version: 2.28
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2019-04-06 23:59 UTC by Søren Holm
Modified: 2019-08-19 20:42 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Small application print my expectations for the memory actualy mapped to the application. (441 bytes, text/x-csrc)
2019-04-06 23:59 UTC, Søren Holm
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Søren Holm 2019-04-06 23:59:31 UTC
Created attachment 11723 [details]
Small application print my expectations for the memory actualy mapped to the application.

Hi

During a software project I'm working on we have seen that transient high memory usage never makes the actual memory usage shrink when the memory is freed.

We have been investigating this and have of cause found that M_MMAP_THRESHOLD is by default 128kB. But even setting it lower - fx. 1024 bytes or event 0 - does not solve issue of the memory not been given back to the OS.

I suspect that the reason is that malloc uses sbrk and that essensially makes the heap work like a stack - and stuff at the top of the stack can block the whole stack from being freed.

Regarding the M_MMAP_THRESHOLD we also realy struggle to understand why memory allocations < 128kB are done via sbrk. Most applications allocate most of their memory by calling malloc with sizes less than that - event less that 1024 in many cases. I mean, C++ 'new' maps directly to malloc - how many object are more that 128kB is size?

We have found that a port of the OpenBSD malloc implementation works much better.

https://github.com/andrewg-felinemenace/Linux-OpenBSD-malloc

A less important fact of the glibc malloc essensially is that the inplementation makes it impossible assert the memory *actual* usage of an application purely based on the memory usage reported by the OS
Comment 1 Søren Holm 2019-08-16 05:27:51 UTC
Are there really no comments about this issue in 4 months ?
Comment 2 Adhemerval Zanella 2019-08-16 14:12:06 UTC
(In reply to Søren Holm from comment #0)
> Created attachment 11723 [details]
> Small application print my expectations for the memory actualy mapped to the
> application.
> 
> Hi
> 
> During a software project I'm working on we have seen that transient high
> memory usage never makes the actual memory usage shrink when the memory is
> freed.
> 
> We have been investigating this and have of cause found that
> M_MMAP_THRESHOLD is by default 128kB. But even setting it lower - fx. 1024
> bytes or event 0 - does not solve issue of the memory not been given back to
> the OS.

I am trying to understand which are your expectations and how are you obtaining 
the process memory usage based on the program you provided.  Instrumenting with
a simple routine to read the /proc/self/smaps_rollup:

--
static void
read_smaps (void)
{ 
  int fd = open ("/proc/self/smaps_rollup", O_RDONLY);
  assert (fd != -1);
  char buffer[512];
  ssize_t ret = read (fd, buffer, sizeof (buffer));
  buffer[ret] = '\0';
  printf ("%s\n", buffer);
  close (fd);
}
--

I am seeing on a Linux 4.14 ppc64le the output:

---
sizeof(buffers)=8000000
10000000-7fffe8470000 ---p 00000000 00:00 0                              [rollup]
Rss:                1920 kB
Pss:                 594 kB
Shared_Clean:       1344 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:       576 kB
Referenced:         1920 kB
Anonymous:           512 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:   
=============================================
1. Allocating memory.
2. Now I would expect actual memory usage to essensialy zero.
10000000-7fffe8470000 ---p 00000000 00:00 0                              [rollup]
Rss:             5153152 kB
Pss:             5151826 kB
Shared_Clean:       1344 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:   5151808 kB
Referenced:      5153152 kB
Anonymous:       5151744 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:   
=============================================
3. Zeroing memory.	
4. Now memory should actually be used.
10000000-7fffe8470000 ---p 00000000 00:00 0                              [rollup]
Rss:             5153152 kB
Pss:             5151826 kB
Shared_Clean:       1344 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:   5151808 kB
Referenced:      5153152 kB
Anonymous:       5151744 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:   
=============================================
5. Freeing memory.
6. Now memory usage should be back the level in step 2.
10000000-7fffe8470000 ---p 00000000 00:00 0                              [rollup]
Rss:                9920 kB
Pss:                8594 kB
Shared_Clean:       1344 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:      8576 kB
Referenced:         9920 kB
Anonymous:          8512 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:   
=============================================
---

The remaining 'Private_Dirty' are the 'buffers' array itself, which on a 64-bit
machine has the size of 7812 KB. Summing up with initial Private_Dirty (from 
DSO loading and startup code), you have 8388.  My guess is the remaining 188kb 
are due to stack usage.

> 
> I suspect that the reason is that malloc uses sbrk and that essensially
> makes the heap work like a stack - and stuff at the top of the stack can
> block the whole stack from being freed.

The 'free' would eventually call internal trim operations to get back the
memory to OS (although there are some pathological cases where you would want
to call trim yourself).

> 
> Regarding the M_MMAP_THRESHOLD we also realy struggle to understand why
> memory allocations < 128kB are done via sbrk. Most applications allocate
> most of their memory by calling malloc with sizes less than that - event
> less that 1024 in many cases. I mean, C++ 'new' maps directly to malloc -
> how many object are more that 128kB is size?

The sbrk usage is due the mmap usage is limited to avoid its exhaustion because
the kernel might limit the total number of mmap pages.  You can tune it with
mallopt (M_MMAP_MAX, ...), but it will incur in more system calls and in slight
more memory usage because each allocation will be page size aligned.

> 
> We have found that a port of the OpenBSD malloc implementation works much
> better.
> 
> https://github.com/andrewg-felinemenace/Linux-OpenBSD-malloc
> 
> A less important fact of the glibc malloc essensially is that the
> inplementation makes it impossible assert the memory *actual* usage of an
> application purely based on the memory usage reported by the OS

Could you clarify your assertation? What do you mean by 'actual' usage of
the application? Do you mean the asked memory from malloc without taking
internal alignment of fragmentation in consideration? 

So far I see no indication that glibc malloc is not returning memory back
to OS.
Comment 3 Adhemerval Zanella 2019-08-16 14:22:49 UTC
In fact I run against a 2.17, on newer one that tcache seems to generate a lot os dirty pages as well. I will investigate further.
Comment 4 Adhemerval Zanella 2019-08-16 19:55:00 UTC
Ok, the issue is not related to tcache (it was just a sheer of luck that I observed it on more recent glibcs).

The issue in indeed that after M_MMAP_MAX mmap calls the system allocator will turn to brk calls to increase the main arena. You can check if with mallinfo struct mallinfo 'mi' field, which in some cases holds a lot of memory.

You can try to call malloc_trim in such cases, but there is no guarantee all the memory will be returned. 

The issue is when brk call start to return a failure (malloc/malloc.c:2493), the sysmalloc will fallback to mmap again and set the state as non-contiguous (malloc/malloc.c:2517). And since the heap is fragmented the systrim can't actually call brk with negative values to give memory back to system. 

I need to check if there is something we can do to improve the logic to release memory back for such case.
Comment 5 Carlos O'Donell 2019-08-16 20:23:55 UTC
(In reply to Adhemerval Zanella from comment #4)
> Ok, the issue is not related to tcache (it was just a sheer of luck that I
> observed it on more recent glibcs).
> 
> The issue in indeed that after M_MMAP_MAX mmap calls the system allocator
> will turn to brk calls to increase the main arena. You can check if with
> mallinfo struct mallinfo 'mi' field, which in some cases holds a lot of
> memory.
> 
> You can try to call malloc_trim in such cases, but there is no guarantee all
> the memory will be returned. 
> 
> The issue is when brk call start to return a failure (malloc/malloc.c:2493),
> the sysmalloc will fallback to mmap again and set the state as
> non-contiguous (malloc/malloc.c:2517). And since the heap is fragmented the
> systrim can't actually call brk with negative values to give memory back to
> system. 
> 
> I need to check if there is something we can do to improve the logic to
> release memory back for such case.

Is this a duplicate of this bug?
https://sourceware.org/bugzilla/show_bug.cgi?id=15321
Comment 6 Adhemerval Zanella 2019-08-19 20:42:02 UTC
(In reply to Carlos O'Donell from comment #5)
> (In reply to Adhemerval Zanella from comment #4)
> > Ok, the issue is not related to tcache (it was just a sheer of luck that I
> > observed it on more recent glibcs).
> > 
> > The issue in indeed that after M_MMAP_MAX mmap calls the system allocator
> > will turn to brk calls to increase the main arena. You can check if with
> > mallinfo struct mallinfo 'mi' field, which in some cases holds a lot of
> > memory.
> > 
> > You can try to call malloc_trim in such cases, but there is no guarantee all
> > the memory will be returned. 
> > 
> > The issue is when brk call start to return a failure (malloc/malloc.c:2493),
> > the sysmalloc will fallback to mmap again and set the state as
> > non-contiguous (malloc/malloc.c:2517). And since the heap is fragmented the
> > systrim can't actually call brk with negative values to give memory back to
> > system. 
> > 
> > I need to check if there is something we can do to improve the logic to
> > release memory back for such case.
> 
> Is this a duplicate of this bug?
> https://sourceware.org/bugzilla/show_bug.cgi?id=15321

Yes it is, the issue description along with the provided test case shows the same issue.

*** This bug has been marked as a duplicate of bug 15321 ***