Abnormal memory usage with glibc 2.31 related to thread cache and trimming strategy ?

Xavier Roche xavier.roche@algolia.com
Wed Sep 16 12:14:40 GMT 2020


Dear glibc enthusiasts,

We at Algolia have been experiencing really strong memory usage
regressions with glibc 2.31 (we are currently using an older glibc
2.23) in some use-cases involving high workload on medium-size systems
(128GB of RAM, 12 core)

By strong regression, it means an order of magnitude between 2 and 10
in terms of memory usage compared to the 2.23 release.

Looking at NEWS
(https://sourceware.org/git/?p=glibc.git;a=blob_plain;f=NEWS;hb=HEAD)
the only possible major change could be the 2.26's per-thread cache,
but this is a pure guess not backed by any proof.

Investigations show that calling malloc_trim(0) "solves" the memory
consumption issue, which tends to hint at a trimming strategy issue in
existing heap pools.

In the example below, we could reduce resident size from ~120GB to
~9GB by calling malloc_trim(). We neither use any specific mallopt
setting nor any GLIBC_TUNABLES environment tuning.

We have nearly 100 heaps, and some of them have really high free block
usage (15GB):

Interesting parts extracted from malloc_info():

<heap nr="87">
<sizes>
... ( skipped not so interesting part )
  <size from="542081" to="67108801" total="15462549676" count="444"/>
  <unsorted from="113" to="113" total="113" count="1"/>
</sizes>
<total type="fast" count="0" size="0"/>
<total type="rest" count="901" size="15518065028"/>
<system type="current" size="15828295680"/>
<system type="max" size="16474275840"/>
<aspace type="total" size="15828295680"/>
<aspace type="mprotect" size="15828295680"/>
<aspace type="subheaps" size="241"/>
</heap>

The global stats seem to hint 137MB of free memory not reclaimed by
the system (if "rest" are really free blocks, which I only guessed)

<total type="fast" count="551" size="35024"/>
<total type="rest" count="511290" size="137157559274"/>
<total type="mmap" count="12" size="963153920"/>
<system type="current" size="139098812416"/>
<system type="max" size="197709660160"/>
<aspace type="total" size="139098812416"/>
<aspace type="mprotect" size="140098441216"/>

We tried playing with glibc.malloc.trim_threshold (with values as low
as 1048576) or glibc.malloc.mmap_threshold, but it did not really
help.

Is this behavior something expected ?

I'm ready to do any suggested tuning or extract any relevant data if
needed! (notably, the full malloc_info() XML dump)

Thanks for any hint,

-- 
Xavier Roche -
xavier.roche@algolia.com


More information about the Libc-help mailing list