This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: replace ptmalloc2
- From: Jörn Engel <joern at purestorage dot com>
- To: Rich Felker <dalias at libc dot org>
- Cc: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 14 Oct 2014 16:32:54 -0700
- Subject: Re: RFC: replace ptmalloc2
- Authentication-results: sourceware.org; auth=none
- References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com> <20141010010743 dot GA15146 at Sligo dot logfs dot org> <20141010012530 dot GX23797 at brightrain dot aerifal dot cx> <20141010013302 dot GC15146 at Sligo dot logfs dot org> <20141010020229 dot GY23797 at brightrain dot aerifal dot cx>
On Thu, Oct 09, 2014 at 10:02:29PM -0400, Rich Felker wrote:
>
> The sane behavior is to keep the same PROT_NONE/mprotect pattern, but
> expand by exponentially increasing amounts rather than one page each
> time. E.g. force the Nth expansion to be at least 2^N pages.
Or maybe not mprotect at all and do some slow-start algorithm for mmap.
There are many options one can pick from. Main question is how to keep
the code as simple as possible while achieving the goal.
For the moment I just removed the mprotect completely for some
benchmarks. That brings ptmalloc2 pretty close to jemalloc. In some
microbenchmarks it is 30% slower, in some it is 30% faster. Both of
them consistently outperform tcmalloc, which came as a surprise.
And jemalloc seems to have a nasty design flaw. It is essentially a
buddy allocator once you cross a certain size. Size used to be 512B in
2006 and is 4k for the binary I tested. malloc(4097) will return 8k,
causing up to 2x memory overhead. Improving this in jemalloc seems much
harder than improving ptmalloc2, so my quest to replace the default
allocator is over.
Anyhow, here are some raw numbers for the curious. Benchmark allocated
2GB in 8 threads in sizes between 384B and 12288B and memset the memory.
runtime VmRSS VmData maps syscalls
libc 7.165s 2107908 2590048 67 332955
libc-mprotect 0.768s 2107944 2399808 35 4149
jemalloc 0.962s 2652152 2695332 42 5521
tcmalloc 1.510s 2245760 2278460 47 38766
In this particular benchmark my hacked-up ptmalloc2 is winning, while a
standard ptmalloc2 is clearly the worst of the bunch.
Jörn
--
It does not matter how slowly you go, so long as you do not stop.
-- Confucius