This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Comparing glibc's malloc to jemalloc for 389 Directory Server.

From: Carlos O'Donell <carlos at redhat dot com>
To: GNU C Library <libc-alpha at sourceware dot org>
Cc: Siddhesh Poyarekar <sid at reserved-bit dot com>
Date: Wed, 18 May 2016 15:45:29 -0400
Subject: Comparing glibc's malloc to jemalloc for 389 Directory Server.
Authentication-results: sourceware.org; auth=none

Community,

The 389 Directory Server community has been seeing problems
with glibc's malloc and they have found that switching to
jemalloc resolves some of these issues.

I do not doubt that jemalloc resolves alleviates the problems
they are seeing with RSS usage, OOM killing, and other problems.
Jemalloc is a state of the art allocator.

However, when making such choices to change from one allocator
to another, particularly for a given workload, you want to make
an informed decision about such a change. I have seen specific
examples where developers switched to tcmalloc, only to find
other other workloads regress. It's a difficult problem and
almost impossible to be the *best* allocator for all workloads.
You have to be a good allocator for an average number of
workloads.

I've written up my notes regarding how 389-ds could improve
and redo their analysis here:
https://sourceware.org/glibc/wiki/389-ds-malloc

As it stands jemalloc only looks slightly better on the average
e.g. 14% less RSS usage. I expect that it is actually much
better than that and we have a lot of room to improve glibc's
allocator.

In the long term what I'm going to push for is as follows:

- Ability to collect trace data from user workload.
- Ability to model the workload locally.
- Ability to make changes to implementation and estimate,
  given the model, how it will impact the user's workload.

These are things we talked about at Cauldron 2015 in the
whole-system benchmarking BoF. We are taking it very slow right
now and looking at just the malloc subsystem to gain experience.

The framework should tell you, within some statistical bounds,
exactly which of the multitude of allocators available outside
of glibc might be the best match for your workload.

DJ Delorie is leading this work on the dj/malloc branch with
some interesting advancements including:

- Lockless thread local cache.
- Low-overhead malloc API trace framework (circular buffer with
  trace controls).
- Running traces in a model simulator to test malloc changes
  (replays the trace).

I am going to continue to work with the 389-ds developers to
see exactly what is causing the problems between glibc's malloc
and the allocation patterns they use.

Any comments on the wiki writeup are much appreciated.

-- 
Cheers,
Carlos.

Follow-Ups:
- Re: Comparing glibc's malloc to jemalloc for 389 Directory Server.
  - From: Carlos O'Donell

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]