This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

RFC: replace ptmalloc2


I have recently been forced to look at the internals of ptmalloc2.
There are some low-hanging fruits for fixing, but overall I find it
more worthwhile to replace the allocator with one of the alternatives,
jemalloc being my favorite.

Problems encountered the hard way:
- Using per-thread arenas causes horrible memory bloat.  While it is
  theoretically possible for arenas to shrink and return memory to the
  kernel, that rarely happens in practice.  Effectively every arena
  retains the biggest size it has ever had in history (or close to).
  Given many threads and dynamic behaviour of individual threads, a
  significant ratio of memory can be wasted here.
- mmap() returning NULL once 65530 vmas are used by a process.  There
  is a kernel-bug that plays into this, but ptmalloc2 would hit this
  limit even without the kernel bug.  Given a large system, one can go
  OOM (malloc returning NULL) with hundreds of gigabytes free on the
  system.
- mmap_sem causing high latency for multithreaded processes.  Yes,
  this is a kernel-internal lock, but ptmalloc2 is the main reason for
  hammering the lock.

Possible improvements found by source code inspection and via
testcases:
- Everything mentioned in
  https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
- Arenas are a bad choice for per-thread caches.
- mprotect_size seems to be responsible for silly behaviour.  When
  extending the main arena with sbrk(), one could immediately
  mprotect() the entire extension and be done.  Instead mprotect() is
  often called in 4k-granularities.  Each call takes the mmap_sem
  writeable and potentially splits off new vmas.  Way too expensize to
  do in small granularities.
  It gets better when looking at the other arenas.  Memory is
  allocated via mmap(PROT_NONE), so every mprotect() will split off
  new vmas.  Potentially some of them can get merged later on.  But
  current Linux kernels contain at least one bug, so this doesn't
  always happen.
  If someone is arguing in favor of PROT_NONE as a debug- or
  security-measure, I wonder why we don't have the equivalent for the
  main arena.  Do we really want the worst of both worlds?

All of the above have convinced me to abandon ptmalloc2 and use a
different allocator for my work project.  But look at the facebook
post again and see the 2x performance improvement for their webserver
load.  That is not exactly a micro-benchmark for allocators, but
translates to significant hardware savings in the real world.  It
would be nice to get those savings out of the box.

Jörn

--
With a PC, I always felt limited by the software available. On Unix,
I am limited only by my knowledge.
-- Peter J. Schoenster


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]