This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: RFC: replace ptmalloc2

From: Jörn Engel <joern at purestorage dot com>
To: Rich Felker <dalias at libc dot org>
Cc: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
Date: Tue, 14 Oct 2014 16:32:54 -0700
Subject: Re: RFC: replace ptmalloc2
Authentication-results: sourceware.org; auth=none
References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com> <20141010010743 dot GA15146 at Sligo dot logfs dot org> <20141010012530 dot GX23797 at brightrain dot aerifal dot cx> <20141010013302 dot GC15146 at Sligo dot logfs dot org> <20141010020229 dot GY23797 at brightrain dot aerifal dot cx>

On Thu, Oct 09, 2014 at 10:02:29PM -0400, Rich Felker wrote:
> 
> The sane behavior is to keep the same PROT_NONE/mprotect pattern, but
> expand by exponentially increasing amounts rather than one page each
> time. E.g. force the Nth expansion to be at least 2^N pages.

Or maybe not mprotect at all and do some slow-start algorithm for mmap.
There are many options one can pick from.  Main question is how to keep
the code as simple as possible while achieving the goal.

For the moment I just removed the mprotect completely for some
benchmarks.  That brings ptmalloc2 pretty close to jemalloc.  In some
microbenchmarks it is 30% slower, in some it is 30% faster.  Both of
them consistently outperform tcmalloc, which came as a surprise.

And jemalloc seems to have a nasty design flaw.  It is essentially a
buddy allocator once you cross a certain size.  Size used to be 512B in
2006 and is 4k for the binary I tested.  malloc(4097) will return 8k,
causing up to 2x memory overhead.  Improving this in jemalloc seems much
harder than improving ptmalloc2, so my quest to replace the default
allocator is over.

Anyhow, here are some raw numbers for the curious.  Benchmark allocated
2GB in 8 threads in sizes between 384B and 12288B and memset the memory.
		runtime	VmRSS	VmData	maps	syscalls
libc		7.165s	2107908	2590048	67	332955
libc-mprotect	0.768s	2107944	2399808	35	4149
jemalloc	0.962s	2652152	2695332	42	5521
tcmalloc	1.510s	2245760	2278460	47	38766

In this particular benchmark my hacked-up ptmalloc2 is winning, while a
standard ptmalloc2 is clearly the worst of the bunch.

Jörn

--
It does not matter how slowly you go, so long as you do not stop.
-- Confucius

Follow-Ups:
- Re: RFC: replace ptmalloc2
  - From: Rich Felker

References:
- RFC: replace ptmalloc2
  - From: Jörn Engel
- Re: RFC: replace ptmalloc2
  - From: Jörn Engel
- Re: RFC: replace ptmalloc2
  - From: Rich Felker
- Re: RFC: replace ptmalloc2
  - From: Jörn Engel
- Re: RFC: replace ptmalloc2
  - From: Rich Felker

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]