This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

RFC: replace ptmalloc2

From: Jörn Engel <joern at purestorage dot com>
To: "GNU C. Library" <libc-alpha at sourceware dot org>
Date: Thu, 9 Oct 2014 14:54:47 -0700
Subject: RFC: replace ptmalloc2
Authentication-results: sourceware.org; auth=none

I have recently been forced to look at the internals of ptmalloc2.
There are some low-hanging fruits for fixing, but overall I find it
more worthwhile to replace the allocator with one of the alternatives,
jemalloc being my favorite.

Problems encountered the hard way:
- Using per-thread arenas causes horrible memory bloat. While it is
theoretically possible for arenas to shrink and return memory to the
kernel, that rarely happens in practice. Effectively every arena
retains the biggest size it has ever had in history (or close to).
Given many threads and dynamic behaviour of individual threads, a
significant ratio of memory can be wasted here.
- mmap() returning NULL once 65530 vmas are used by a process. There
is a kernel-bug that plays into this, but ptmalloc2 would hit this
limit even without the kernel bug. Given a large system, one can go
OOM (malloc returning NULL) with hundreds of gigabytes free on the
system.
- mmap_sem causing high latency for multithreaded processes. Yes,
this is a kernel-internal lock, but ptmalloc2 is the main reason for
hammering the lock.

Possible improvements found by source code inspection and via
testcases:
- Everything mentioned in
https://www.facebook.com/notes/facebook-engineering/scalable-memory-allocation-using-jemalloc/480222803919
- Arenas are a bad choice for per-thread caches.
- mprotect_size seems to be responsible for silly behaviour. When
extending the main arena with sbrk(), one could immediately
mprotect() the entire extension and be done. Instead mprotect() is
often called in 4k-granularities. Each call takes the mmap_sem
writeable and potentially splits off new vmas. Way too expensize to
do in small granularities.
It gets better when looking at the other arenas. Memory is
allocated via mmap(PROT_NONE), so every mprotect() will split off
new vmas. Potentially some of them can get merged later on. But
current Linux kernels contain at least one bug, so this doesn't
always happen.
If someone is arguing in favor of PROT_NONE as a debug- or
security-measure, I wonder why we don't have the equivalent for the
main arena. Do we really want the worst of both worlds?

All of the above have convinced me to abandon ptmalloc2 and use a
different allocator for my work project. But look at the facebook
post again and see the 2x performance improvement for their webserver
load. That is not exactly a micro-benchmark for allocators, but
translates to significant hardware savings in the real world. It
would be nice to get those savings out of the box.

Jörn

--
With a PC, I always felt limited by the software available. On Unix,
I am limited only by my knowledge.
-- Peter J. Schoenster

Follow-Ups:
- Re: RFC: replace ptmalloc2
  - From: Rich Felker

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]