Bug 6952 - malloc is not thread-safe
Summary: malloc is not thread-safe
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.4
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
Depends on:
Reported: 2008-10-08 09:18 UTC by axel.philipp
Modified: 2014-07-02 06:57 UTC (History)
2 users (show)

See Also:
Last reconfirmed:
fweimer: security-

Test case (1.83 KB, text/plain)
2008-10-09 14:42 UTC, axel.philipp
Test case with multithreaded malloc and free in master thread only (1.93 KB, text/plain)
2008-10-14 16:26 UTC, axel.philipp
improved test case which makes it easier to reproduce the effect (1.99 KB, text/plain)
2009-02-06 13:03 UTC, axel.philipp

Note You need to log in before you can comment on or make changes to this bug.
Description axel.philipp 2008-10-08 09:18:18 UTC
When trying to use Calculix (www.calculix.de) with the multithreaded version of 
Spooles I ran into problems. It turned out that the malloc delivered with 
Novell's SLES10 (glibc 2.4) isn't thread-safe. After linking in Wolfram 
Gloger's ptmalloc3 the program runs properly.
Newer glibc versions up to 2.7 are still based on ptmalloc2, so I tried to use 
ptmalloc2 instead of ptmalloc3 and got the same problems as with glibc2.4.

Testcondition: 8 threads, either 4 dualcore Opterons or 2 quadcore Xeons, an 
average of 10000 mallocs/s.
Comment 1 Ulrich Drepper 2008-10-08 14:06:25 UTC
Worthless report.  Using a different implementation and saying it works does not
at all prove anything.  Unless you provide a small reproducer nothing will happen.
Comment 2 axel.philipp 2008-10-09 14:42:20 UTC
Created attachment 2986 [details]
Test case

The test case shows signs of a memory leak with additional memory usage in
excess of 1000 MB when linked against glibc. When linked against
libptmalloc3.so the peak memory usage stays constant during the run.
The tests were run on a 4 socket Opteron 8356 system running 64bit SLES9SP3.
Comment 3 axel.philipp 2008-10-10 09:42:10 UTC
My original observation was a severe memory leak, which is easily reproducible, 
and 2 instances of deviating results. The memory leak disappeared when I 
switched to ptmalloc3 (vs. ptmalloc2, which is in glibc), and I could not 
reproduce the deviating results.

However I can no longer reproduce deviating results when using standard 
malloc/free, the memory leak is still present. I do no longer consider the 
severity "critical" I see it as "normal"
Comment 4 axel.philipp 2008-10-10 10:55:02 UTC
I ran some more tests. The memory leak seems to be related to free being called 
inside threads. When I moved the free calls into the master thread, the memory 
leak disappeared. 
Comment 5 axel.philipp 2008-10-14 16:26:16 UTC
Created attachment 3002 [details]
Test case with multithreaded malloc and free in master thread only

This is the source for the test with áll free() calls moved into the master
thread. The usleep and sleep calls are intended to slow the execution down, so
that it can be observed with top.
Comment 6 axel.philipp 2009-02-06 13:03:18 UTC
Created attachment 3722 [details]
improved test case which makes it easier to reproduce the effect

With our real world example there is a striking dependence on RLIMIT_STACK. If
the stacklimit is either low (SuSE default 8192) or unlimited, the memory leak
is very pronounced, if the limit is 512 MB (ulimit -s 524288) it takes several
attempts to reproduce the problem.
The stacklimit has almost no influence on my small example, but if I omit the
usleeps (second parameter 0), it may take dozens of runs to reproduce the
I have modified my example, so that two or three runs should be sufficient to
reproduce the problem. You will need a system with at least two quadcore or
four dualcore processors.
The example is compiled with:
gcc -l pthread -o malloc_thread_test_pa malloc_thread_test_pa.c
If you want to watch the program you should run top -d 1 in a second window.
Another way is to run it in a loop, my program now outputs the peak RSS:

while true; do
  malloc_thread_test_pa 2>/dev/null

If I run malloc_thread_test_pa 8 0 instead, ist nearly impossible to reproduce
the problem.

That's how the output should look like:
loop 0:  VmHWM:  3080068 kB
loop 1:  VmHWM:  3080188 kB
loop 2:  VmHWM:  3080188 kB
loop 3:  VmHWM:  3080188 kB
loop 4:  VmHWM:  3080188 kB
loop 5:  VmHWM:  3080188 kB
loop 6:  VmHWM:  3080188 kB
loop 7:  VmHWM:  3080188 kB
loop 8:  VmHWM:  3080188 kB
loop 9:  VmHWM:  3080188 kB

and that's what I typically get:
loop 0:  VmHWM:  3079520 kB
loop 1:  VmHWM:  3464160 kB
loop 2:  VmHWM:  3464160 kB
loop 3:  VmHWM:  3464280 kB
loop 4:  VmHWM:  3464280 kB
loop 5:  VmHWM:  3464280 kB
loop 6:  VmHWM:  3849292 kB
loop 7:  VmHWM:  3849292 kB
loop 8:  VmHWM:  3849292 kB
loop 9:  VmHWM:  3849292 kB
Comment 7 axel.philipp 2009-02-13 17:21:39 UTC
The problem demonstrated by the test program appearently depends on 
architecture. On a dual quadcore Intel Xeon box, it is sufficient to run the 
test with 6 threads, however on a quad quadcore AMD Opteron system the test 
program does not show any excessive memory consumption even if run with 24