Bug 4349 - _int_malloc extremely slow with ordblks free chunks
Summary: _int_malloc extremely slow with ordblks free chunks
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.4
: P2 normal
Target Milestone: ---
Assignee: Ulrich Drepper
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-04-12 06:38 UTC by Mingzhou Sun
Modified: 2014-07-14 09:24 UTC (History)
1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments
malloc (using new) test program (2.17 KB, text/plain)
2007-04-12 06:41 UTC, Mingzhou Sun
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Mingzhou Sun 2007-04-12 06:38:52 UTC
I am experiencing perhaps a worst-case scenario of malloc when there are a very
large number (million) of free chunks (ordblks in mallinfo).

It's a long running C++ program that, over the course of a long task,
accumulates a large number of objects of various sorts, including STL container
elements and other customer class types. Eventually it uses close to 1G of
memory. When the task is done most but not all of these objects are free'ed,
resulting in a very large number of ordblks's. After that, the subsequent malloc
calls become extremely slow. oprofile shows that most of the time is spent
_int_malloc, and in two while loops in particular.

I wrote a simple test program (which will be attached) to simulate this
scenarios. After getting the test program to this state:
  int arena;    /* non-mmapped space allocated from system */ 1000378368
  int ordblks;  /* number of free chunks */    1000002
 int smblks;   /* number of fastbin blocks */ 0
  int hblks;    /* number of mmapped regions */ 0
 int hblkhd;   /* space in mmapped regions */  0
 int usmblks;  /* maximum total allocated space */ 0
 int fsmblks;  /* space available in freed fastbin blocks */  0
  int uordblks; /* total allocated space */  360360048
  int fordblks; /* total free space */  640018320
  int keepcost;  88624

The subsequent 300 malloc calls will take 20 seconds on a 2.66GHz Xeon Linux. 

oprofile with a debug build of glibc 2.4-11 indicates that the bulk (99%) of the
time is spent in the while loop in _int_malloc:
 sample  %
               :        /* maintain large bins in sorted order */
   132  0.0133 :        if (fwd != bck) {
               :          /* Or with inuse bit to speed comparisons */
     1 1.0e-04 :          size |= PREV_INUSE;
               :          /* if smaller than smallest, bypass loop below */
               :          assert((bck->bk->size & NON_MAIN_ARENA) == 0);
   546  0.0549 :          if ((unsigned long)(size) <= (unsigned
long)(bck->bk->size)) {
               :            fwd = bck;
               :            bck = bck->bk;
               :          }
               :          else {
               :            assert((fwd->size & NON_MAIN_ARENA) == 0);
984997 99.0818 :            while ((unsigned long)(size) < (unsigned
long)(fwd->size)) {
  2935  0.2952 :              fwd = fwd->fd;
               :              assert((fwd->size & NON_MAIN_ARENA) == 0);
               :            }
    29  0.0029 :            bck = fwd->bk;
               :          }
               :        }

I am using the x86_64 glibc-2.4-11 library from Fedora Core 5 update. 

If this is a known limitation, is there a good work around (other than providing
another layer of memory management between the application and malloc library?)
Searching on the web, this problem might be related to the issue Tomash Brechko
had a patch for in Dec 2004
(http://sourceware.org/ml/libc-alpha/2004-12/msg00041.html) His patch is
apparently not included in the glibc 2.4 tree.  Would this patch been eventually
considered?
Comment 1 Mingzhou Sun 2007-04-12 06:41:38 UTC
Created attachment 1675 [details]
malloc (using new) test program

sizetest.cpp, a simple program to reproduce a worst-case malloc scenario when
there are a large number of free chunks
Comment 2 Tomash Brechko 2007-04-19 10:13:30 UTC
(In reply to comment #0)
> If this is a known limitation, is there a good work around?

Based on your description it seems you hit the problem addressed by the named patch.

You may download ptmalloc from malloc.de and link with it.  Basically, malloc in
glibc 2.4 (and in 2.5 :-/) is based on ptmalloc2.  You could use
ptmalloc2+patch, but better to just use ptmalloc3.  While the patch implements
naive skip lists to speed up the search of the chunk of the right size,
ptmalloc3 comes with bitwise digital trees (aka tries), which is a much better
solution for the same problem.

> Searching on the web, this problem might be related to the issue Tomash Brechko
> had a patch for in Dec 2004
> (http://sourceware.org/ml/libc-alpha/2004-12/msg00041.html) His patch is
> apparently not included in the glibc 2.4 tree.  Would this patch been eventually
> considered?

Probably no.  As ptmalloc3 is the better solution, we have but to wait until it
will be adopted to glibc.  I guess binary incompatibility is the main obstacle
for that.  Until then, link with ptmalloc3 yourself.

  Tomash
Comment 3 Ulrich Drepper 2007-04-30 23:34:17 UTC
The cvs code has some changes for this.
Comment 4 Jackie Rosen 2014-02-16 17:45:02 UTC Comment hidden (spam)