Excessive memory consumption when using malloc()

Carlos O'Donell carlos@redhat.com
Thu Nov 25 18:20:18 GMT 2021


On 11/25/21 12:20, Christian Hoff via Libc-help wrote:
> Hello all,
>
> we are facing the a problem with the memory allocator in glibc 2.17 on
> RHEL 7.9. Or application allocates about 10 GB of memory (split into
> chunks that are each around 512 KB large). This memory is used for some
> computations and released afterwards. After a while, the application is
> running the same computations again, but this time in different threads.
> The first issue we are seeing is that - after the computations are done
> - the 10 GB of memory is not released back to the operating system. Only
> after calling malloc_trim() manually with GDB, the size of the process
> shrinks dramatically from ~10GB to 400 MB. So, at this point, the unused
> memory from the computations is finally returned to the OS.

How many cpus does the system have?

How many threads do you create?

Is this 10GiB of RSS or VSS?

For very large systems glibc malloc will create up to 8 arenas per CPU.

Each arena starts with a default 64MiB VMA reservation.

On a 128 core system this appears as a ~65GiB VSS reservation.
 
> Our wish would be that the memory is returned to the OS without us
> having to call malloc_trim(). And I understand that glibc also trims the
> heap when there is sufficient free space in top of it (the
> M_TRIM_THRESHOLD in mallopt() controls when this should happen). What
> could be the reason why this is not working in our case? Could it be
> related to heap fragmentation? But assuming that is the reason, why is
> malloc_trim() nevertheless able to free this memory?

The normal trimming strategy is to trim from the top of the heap down.

Chunks at the top of the heap are coalesced and eventually when the chunk is big enough
the heap is freed down.

This coalescing and freeing is prevented it there are in-use chunks in the heap.

Consider this scenario:
- Make many large allocations that have a short lifetime.
- Make one small allocation that has a very long lifetime.
- Free all the large allocations.

The heap cannot be freed downwards because of the small long liftetime allocation.

The call to malloc_trim() walks the heap chunks and frees page-sized chunks or
larger without the requirement that they come from the top of the heap.

In glibc's allocator, mixing lifetimes for allocations will cause heap growth.

I have an important question to ask now:

Do you use aligned allocations?

We have right now an outstanding defect where aligned allocations create small
residual free chunks, and when free'd back and allocated again as an aligned
chunk, we are forced to split chunks again, which can lead to ratcheting effects
with certain aligned allocations.

We had a prototype patch for this in Fedora in 2019:
https://lists.fedoraproject.org/archives/list/glibc@lists.fedoraproject.org/thread/2PCHP5UWONIOAEUG34YBAQQYD7JL5JJ4/
 
> And then we also have one other problem. The first run of the
> computations is always fine: we allocate 10 GB of memory and the
> application grows to 10 GB. Afterwards, we release those 10 GB of memory
> since the computations are now done and at this point the freed memory
> is returned back to the allocator (however, the size of the process
> remains 10 GB unless we call malloc_trim()). But if we now re-run the
> same computations again a second time (this time using different
> threads), a problem occurs. In this case, the size of the application
> grows well beyond 10 GB. It can get 20 GB or larger and the process is
> eventually killed because the system runs out of memory.

You need to determine what is going on under the hood here.

You may want to just use malloc_info() to get a routine dump of the heap state.

This will give us a starting point to see what is growing.

We have a malloc allocation tracer that you can use to capture a workload and
share a snapshot of the workload with upstream:
https://pagure.io/glibc-malloc-trace-utils

Sharing the workload might be hard because this is a full API trace and it gets
difficult to share.

> Do you have any idea why this happens? To me it seems like the threads
> are assigned to different arenas and therefore the previously freed 10
> GB of memory can not be re-used as they are in different arenas. Is that
> possible?

I don't know why this happens.

Threads once bound to an arena are normally never move unless an allocation fails.
 
> A workaround I have found is to set M_MMAP_THRESHOLD to 128 KB - then
> the memory for the computations is always allocated using mmap() and
> returned back to the system immediately when it is free()'ed. This
> solves both of the issues. But I am afraid that this workaround could
> degrade the performance of our application. So, we are grateful for any
> better solution to this problem.

It will degrade performance because you must do a syscall all the time. You can try
raising the value.

-- 
Cheers,
Carlos.



More information about the Libc-help mailing list