Bug 21731 - Bad malloc heuristic result in 30-40x memory consumption
Summary: Bad malloc heuristic result in 30-40x memory consumption
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: malloc (show other bugs)
Version: 2.25
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-07-07 09:24 UTC by Yichao Yu
Modified: 2017-07-11 23:51 UTC (History)
3 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
Compressed source code (2.52 MB, application/x-bzip)
2017-07-07 09:24 UTC, Yichao Yu
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Yichao Yu 2017-07-07 09:24:36 UTC
Created attachment 10251 [details]
Compressed source code

The program attached is created from a log of a program that has a loop that allocates and free large chunks of memory (the `posix_memalign`s) while at the same time "leaks" small objects. The memory leak per loop is ~1.6kB. However running this (may need to compile with `-O0` to make the compilation finish in reasonable amount of time and to avoud compiler removing allocations) with glibc malloc implementation results in a `malloc_stats` output that ends up like

```
Arena 0:
system bytes     = 2498392064
in use bytes     =   68811392
Total (incl. mmap):
system bytes     = 2512826368
in use bytes     =   83245696
max mmap regions =         10
max mmap bytes   =   67239936
```

in ~100 cycles which I believe indicates that the allocation is not reusing memory. The `in use bytes` reflects the actual allocation in the loops but the growth in `system bytes` (and total memory consumption) increases at roughly the same rate as the large allocation in the loop even though it's free'd in the next loop as indicated by `in use bytes`.

Ref https://github.com/JuliaIO/HDF5.jl/issues/349#issuecomment-313454744
Comment 1 Adhemerval Zanella 2017-07-11 17:53:55 UTC
The 'system bytes' field from malloc_stats reports the main arena size which is size requested with 'sbrk'.  And since you are explicit 'leaking' memory, most likely it requires to continually grow the main arena virtual memory due fragmentation.  Keep in mind that although it does increases the maximum virtual memory of the process, it does not actually increase the total RSS usage.

Using the same example, if you intercept malloc_stat to print total RSS usage using this naive approach:

---
$ cat intercept.c 
#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

void malloc_stats (void)
{
  //system ("cat /proc/self/status | grep -E '^Vm(RSS|Data)'");
  system ("cat /proc/self/status | grep -E '^VmRSS'");
}
$ gcc -Wall intercept.c -shared -o libintercept.so -fpic
$ LD_PRELOAD=./libintercept.so ./malloc
---

You will see that total RSS usage does not really change significantly.  For instance, a run I am seeing:

---
$ LD_PRELOAD=./libintercept.so ./malloc
VmRSS:       708 kB
VmRSS:       744 kB
[...]
VmRSS:       860 kB
VmRSS:       696 kB
[...]
VmRSS:       708 kB
VmRSS:       796 kB
---

Also you can change the sbrk behavior and force mmap allocations for large sizes on main arena using MALLOC_TOP_PAD_=0:

---
$ MALLOC_TOP_PAD_=0 ./malloc
Arena 0:
system bytes     =   16076800
in use bytes     =   15697952
Total (incl. mmap):
system bytes     =   57049088
in use bytes     =   56670240
max mmap regions =         10
max mmap bytes   =   41500672
[...]
Arena 0:
system bytes     =   16076800
in use bytes     =   15879792
Total (incl. mmap):
system bytes     =   83451904
in use bytes     =   83254896
max mmap regions =         12
max mmap bytes   =   67903488
---

Now, what do you mean by 'bad heuristic' and by 'memory consumption' in this specific case?
Comment 2 Yichao Yu 2017-07-11 18:19:29 UTC
> memory consumption

I'm seeing (at least in the original program) that the system memory usage is going up and the max rss of the process too.
Comment 3 Adhemerval Zanella 2017-07-11 18:23:43 UTC
(In reply to Yichao Yu from comment #2)
> > memory consumption
> 
> I'm seeing (at least in the original program) that the system memory usage
> is going up and the max rss of the process too.

And this does not seem to be a GLIBC issue based on the example provided. Also, if you ran the example on valgrind you see it does have a lot of actual memory leaks (which I am not sure if they are intentional or not for the example itself).
Comment 4 Yichao Yu 2017-07-11 18:27:42 UTC
> And this does not seem to be a GLIBC issue based on the example provided

It is, since the memory consumption (measured by overall system memory usage and also the maxrss of the process) grows much faster than what the actual accumulated memory does (by more than 100x).

> Also, if you ran the example on valgrind you see it does have a lot of actual memory leaks (which I am not sure if they are intentional or not for the example itself).

Yes it will obviously detect leaks since the program does not free all the memory it allocates. As mentioned in the original report and above though, that small amount of accumulated allocation is causing a much higher memory consumption increase (measured by overall system memory usage and also the maxrss of the process).
Comment 5 Yichao Yu 2017-07-11 18:32:27 UTC
```
  system ("cat /proc/self/status | grep -E '^VmRSS'");
```

FWIW, I think this is reading the status of the `cat`.
Comment 6 Adhemerval Zanella 2017-07-11 18:40:38 UTC
(In reply to Yichao Yu from comment #5)
> ```
>   system ("cat /proc/self/status | grep -E '^VmRSS'");
> ```
> 
> FWIW, I think this is reading the status of the `cat`.


Right, my mistake here. However fix this:

---
$ cat intercept.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

void malloc_stats (void)
{
  pid_t pid = getpid ();
  char line[128];
  snprintf (line, 128, "cat /proc/%d/status | grep -E '^VmRSS'", (int) pid);
  system (line);
}
---

does show RSS increase:

$ LD_PRELOAD=./libintercept.so ./malloc
VmRSS:     33220 kB
VmRSS:     33420 kB
VmRSS:     35468 kB
[...]
VmRSS:     61108 kB
VmRSS:     61184 kB
VmRSS:     61184 kB

However it is roughly what valgrind shows as leaked memory:

==18794== LEAK SUMMARY:
==18794==    definitely lost: 54,274,581 bytes in 75,230 blocks
==18794==    indirectly lost: 0 bytes in 0 blocks
==18794==      possibly lost: 65,536 bytes in 1 blocks
==18794==    still reachable: 0 bytes in 0 blocks
==18794==         suppressed: 0 bytes in 0 blocks
Comment 7 Yichao Yu 2017-07-11 19:33:29 UTC
I'm seeing the same with the simplified program now so I guess there are other libc calls that I'm not recording. I'm still fairly certain that this is glibc related since the `/proc/self/maps` of the real process shows that the only growing memory is the heap and no other part of the program is touching the heap directly and the actual accumulation of allocation is much slower than the grow rate.

For the simplified case, I'm seeing a large heap usage that's bigger than rss so I assume glibc calls certain functions to discard the pages without releasing the VM (madvice?) and it seems that this could be what didn't work in the real program.
Comment 8 Adhemerval Zanella 2017-07-11 19:37:10 UTC
The main problem is the example you provided is *leaking* memory and using this an a driver to state that glibc is resulting in increase memory consumption does not make sense. Either you need to provide an example that does not leak memory and thus shows that this is indeed an internal glibc issue or fix it in your program.
Comment 9 Adhemerval Zanella 2017-07-11 19:39:55 UTC
So, how representative is this testcase related to the program in question? Does the program also shows the same leak misbehavior as the testcase? Did you ran your program using a memory profile to make sure it is indeed not leaking memory (valgrind, lsan, etc.)?
Comment 10 Yichao Yu 2017-07-11 19:49:45 UTC
As I said in the very first post, the program does have a small "leak". However, I think it is reasonable to expect that if the program allocate <100kB per loop, the peak memory consumption (measured by overall system memory usage and also the maxrss of the process) should not grow 10x or 100x faster than this "leak". It is perfectly valid in a program to accumulate small allocations overtime until they are free'd later. Being able to return to the same memory consumption (measured in whatever way) is not the only measure of a bug-free memory manager, the ratio between the peak memory usage and the memory actually needed is too.
Comment 11 Carlos O'Donell 2017-07-11 20:09:35 UTC
(In reply to Adhemerval Zanella from comment #8)
> The main problem is the example you provided is *leaking* memory and using
> this an a driver to state that glibc is resulting in increase memory
> consumption does not make sense. Either you need to provide an example that
> does not leak memory and thus shows that this is indeed an internal glibc
> issue or fix it in your program.

You can have an application that allocates a cache and does not free the cache. So from the glibc allocator perspective it is not leaking memory, it's just not being returned because it could be in use. So asking for a perfect program that doesn't leak just results in a slightly altered test case that has roughly the same semantics anyway.

The question that remains is:

Where is the memory going?

* external/internal malloc fragmentation?
* program requirements growing with time?
* an application leak?

You have to answer one of these three questions.

For example see:
https://sourceware.org/glibc/wiki/389-ds-malloc

It is not a trivial analysis.

A leak of ~1.6k/loop shows up when you graph the results as a slop in your RSS consumption. You need to be able to analyze that graph and take into account the various changes in your application usage.

DJ Delorie and I have done several of these analyses for larger applications and we have used the malloc simulator and tooling to produce graphs to help users understand the application memory usage.

I have my own tooling here:
https://pagure.io/glibc-malloc-trace-utils

Then on dj's dj/malloc branch, we have the trace and simulator code (in malloc/ directory) that lets you capture a malloc workload, and then re-run it outside of your application environment over and over again to look at the behaviour against various tunables.

I would start by graphing the exact memory usage your application requests for each API call and see what that looks like e.g. sum after every call.

Then graph the program growth in VmRSS and VmSize and see how the two lines relate. The VmRSS growth should be linearly dependent upon the requests, and if it's not then you have to be able to answer the question about about where the memory is going.
Comment 12 Adhemerval Zanella 2017-07-11 20:10:51 UTC
(In reply to Yichao Yu from comment #10)
> As I said in the very first post, the program does have a small "leak".
> However, I think it is reasonable to expect that if the program allocate
> <100kB per loop, the peak memory consumption (measured by overall system
> memory usage and also the maxrss of the process) should not grow 10x or 100x
> faster than this "leak". It is perfectly valid in a program to accumulate
> small allocations overtime until they are free'd later. Being able to return
> to the same memory consumption (measured in whatever way) is not the only
> measure of a bug-free memory manager, the ratio between the peak memory
> usage and the memory actually needed is too.

Again your example is a bad one to try show any point. As from my comment #6 RSS at end of execution is roughly 61MB and valgrind is showing a memory leak of 54MB. This is hardly '10x' increase and it shows more an issue of the workload/example than on GLIBC.

I think you might not be clear here and using 'memory consumption' as virtual memory increase. If you check VMSize from /proc/<pid>/status it does increase over time:

VmSize:    81920 kB
VmSize:   107704 kB
[...]
VmSize:  1164740 kB
VmSize:  1190520 kB

But this does not actually count as actual memory consumption (RSS) and this is indeed an issue for your program (since it does seems to generate a lot of memory fragmentation due the leaks). One option could use the MALLOC_TOP_PAD_ (environment variable) / M_TOP_PAD (mallopt).  This will avoid the arena fragmentation by forcing mmap allocations over sbrk:

# this is greping VmSize instead
$ LD_PRELOAD=./libintercept.so MALLOC_TOP_PAD_=0 ./malloc
VmSize:    81376 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
VmSize:   107160 kB
[...]
VmSize:   107160 kB

GLIBC memory allocation will try to coalesce memory on newer allocations, but fragmentation might impede and thus why the increase of virtual memory allocation (some information on the algorithm used can be found at section 'Malloc Algorithm' [1]).

For the workload in question, are these 'leak' intentional? If it the case it could be a better strategy to create a different allocation strategy, for instance a external memory poll.

[1] https://sourceware.org/glibc/wiki/MallocInternals
Comment 13 Adhemerval Zanella 2017-07-11 20:15:23 UTC
(In reply to Carlos O'Donell from comment #11)
> (In reply to Adhemerval Zanella from comment #8)
> > The main problem is the example you provided is *leaking* memory and using
> > this an a driver to state that glibc is resulting in increase memory
> > consumption does not make sense. Either you need to provide an example that
> > does not leak memory and thus shows that this is indeed an internal glibc
> > issue or fix it in your program.
> 
> You can have an application that allocates a cache and does not free the
> cache. So from the glibc allocator perspective it is not leaking memory,
> it's just not being returned because it could be in use. So asking for a
> perfect program that doesn't leak just results in a slightly altered test
> case that has roughly the same semantics anyway.

And that was not clear neither from the initial bug report nor on subsequent comments. Neither in original project bug report I could get some insight from these initial '~1.6kb' leaks came from of for what the memory is used. So I assumed that this indeed a faulty memory leak.

> 
> The question that remains is:
> 
> Where is the memory going?
> 
> * external/internal malloc fragmentation?
> * program requirements growing with time?
> * an application leak?
> 
> You have to answer one of these three questions.

Thanks for the summary Carlos, I think this is pretty much the information lack we need to move forward.
Comment 14 Yichao Yu 2017-07-11 20:26:37 UTC
> Where is the memory going?

Where is what memory going.

The small actual allocation can be treated as "program requirements growing with time?" for all purpose in this report. The large growth in overall consumption (measured by overall system memory usage and also the maxrss of the process) is exactly what this bug report is about.

> It is not a trivial analysis.

I agree, and thanks for the pointer to the branches/tools, I will try those.

> Again your example is a bad one to try show any point. As from my comment #6 RSS at end of execution is roughly 61MB and valgrind is showing a memory leak of 54MB. This is hardly '10x' increase and it shows more an issue of the workload/example than on GLIBC.

I agree and I've already acknowledged that the code I attached does not seem to reproduce the original issue, which does have a effect on actual memory consumption measured with multiple ways.

> I think you might not be clear here and using 'memory consumption' as virtual memory increase.

No. I think I might not be clear enough when I switched from talking about the behavior of the simplified program, which doesn't show increase in rss, to talking about the actual program, which does. And I'm now basically asking for help to figure out what's missing to reproduce it.

> GLIBC memory allocation will try to coalesce memory on newer allocations, but fragmentation might impede and thus why the increase of virtual memory allocation (some information on the algorithm used can be found at section 'Malloc Algorithm' [1]).

So I guess I might be able to find this info out by reading the link in the previous comment but is there a simple way to dump the heap map? Last time I check it's consistent with glibc leaving the big hole unfilled but it'll be nice if I can verify that.
Comment 15 Carlos O'Donell 2017-07-11 23:51:32 UTC
(In reply to Yichao Yu from comment #14)
> So I guess I might be able to find this info out by reading the link in the
> previous comment but is there a simple way to dump the heap map? Last time I
> check it's consistent with glibc leaving the big hole unfilled but it'll be
> nice if I can verify that.

Unfortunately we don't have a good heap dumper. DJ had a heap dump visualizer, but we haven't developed it beyond prototype. We are open to any suggestions about how best to handle this, but so much application specific information is required that you need to write custom tooling to help you. The graphing octave code I have is the most generic examples I could come up with that I use every time I have to analyze something like this for a user.