Abnormal memory usage with glibc 2.31 related to thread cache and trimming strategy ?
Xavier Roche
xavier.roche@algolia.com
Wed Sep 16 12:14:40 GMT 2020
Dear glibc enthusiasts,
We at Algolia have been experiencing really strong memory usage
regressions with glibc 2.31 (we are currently using an older glibc
2.23) in some use-cases involving high workload on medium-size systems
(128GB of RAM, 12 core)
By strong regression, it means an order of magnitude between 2 and 10
in terms of memory usage compared to the 2.23 release.
Looking at NEWS
(https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=NEWS;hb=HEAD)
the only possible major change could be the 2.26's per-thread cache,
but this is a pure guess not backed by any proof.
Investigations show that calling malloc_trim(0) "solves" the memory
consumption issue, which tends to hint at a trimming strategy issue in
existing heap pools.
In the example below, we could reduce resident size from ~120GB to
~9GB by calling malloc_trim(). We neither use any specific mallopt
setting nor any GLIBC_TUNABLES environment tuning.
We have nearly 100 heaps, and some of them have really high free block
usage (15GB):
Interesting parts extracted from malloc_info():
<heap nr="87">
<sizes>
... ( skipped not so interesting part )
<size from="542081" to="67108801" total="15462549676" count="444"/>
<unsorted from="113" to="113" total="113" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="901" size="15518065028"/>
<system type="current" size="15828295680"/>
<system type="max" size="16474275840"/>
<aspace type="total" size="15828295680"/>
<aspace type="mprotect" size="15828295680"/>
<aspace type="subheaps" size="241"/>
</heap>
The global stats seem to hint 137MB of free memory not reclaimed by
the system (if "rest" are really free blocks, which I only guessed)
<total type="fast" count="551" size="35024"/>
<total type="rest" count="511290" size="137157559274"/>
<total type="mmap" count="12" size="963153920"/>
<system type="current" size="139098812416"/>
<system type="max" size="197709660160"/>
<aspace type="total" size="139098812416"/>
<aspace type="mprotect" size="140098441216"/>
We tried playing with glibc.malloc.trim_threshold (with values as low
as 1048576) or glibc.malloc.mmap_threshold, but it did not really
help.
Is this behavior something expected ?
I'm ready to do any suggested tuning or extract any relevant data if
needed! (notably, the full malloc_info() XML dump)
Thanks for any hint,
--
Xavier Roche -
xavier.roche@algolia.com
More information about the Libc-help
mailing list