This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: RFC: replace ptmalloc2
- From: Rich Felker <dalias at libc dot org>
- To: JÃrn Engel <joern at purestorage dot com>
- Cc: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Wed, 15 Oct 2014 00:00:31 -0400
- Subject: Re: RFC: replace ptmalloc2
- Authentication-results: sourceware.org; auth=none
- References: <20141009215447 dot GD8583 at Sligo dot logfs dot org> <CAAHN_R0JDNQkx7oV0HS9Knv7nsPZiARLeFb4zpPa+rj7cNfECg at mail dot gmail dot com> <20141010010743 dot GA15146 at Sligo dot logfs dot org> <20141010012530 dot GX23797 at brightrain dot aerifal dot cx> <20141010013302 dot GC15146 at Sligo dot logfs dot org> <20141010020229 dot GY23797 at brightrain dot aerifal dot cx> <20141014233254 dot GA1860 at Sligo dot logfs dot org>
On Tue, Oct 14, 2014 at 04:32:54PM -0700, JÃrn Engel wrote:
> On Thu, Oct 09, 2014 at 10:02:29PM -0400, Rich Felker wrote:
> >
> > The sane behavior is to keep the same PROT_NONE/mprotect pattern, but
> > expand by exponentially increasing amounts rather than one page each
> > time. E.g. force the Nth expansion to be at least 2^N pages.
>
> Or maybe not mprotect at all and do some slow-start algorithm for mmap.
> There are many options one can pick from. Main question is how to keep
> the code as simple as possible while achieving the goal.
The exponential expansion approach I described is just a couple lines
of code and completely non-invasive. Yes there are other approaches
like multiple mmaps (so that you never need PROT_NONE) but they have
worse address space fragmentation properties.
> For the moment I just removed the mprotect completely for some
> benchmarks. That brings ptmalloc2 pretty close to jemalloc. In some
> microbenchmarks it is 30% slower, in some it is 30% faster. Both of
> them consistently outperform tcmalloc, which came as a surprise.
This is roughly what I expected.
> And jemalloc seems to have a nasty design flaw. It is essentially a
> buddy allocator once you cross a certain size. Size used to be 512B in
> 2006 and is 4k for the binary I tested. malloc(4097) will return 8k,
> causing up to 2x memory overhead. Improving this in jemalloc seems much
> harder than improving ptmalloc2, so my quest to replace the default
> allocator is over.
>
> Anyhow, here are some raw numbers for the curious. Benchmark allocated
> 2GB in 8 threads in sizes between 384B and 12288B and memset the memory.
> runtime VmRSS VmData maps syscalls
> libc 7.165s 2107908 2590048 67 332955
> libc-mprotect 0.768s 2107944 2399808 35 4149
> jemalloc 0.962s 2652152 2695332 42 5521
> tcmalloc 1.510s 2245760 2278460 47 38766
>
> In this particular benchmark my hacked-up ptmalloc2 is winning, while a
> standard ptmalloc2 is clearly the worst of the bunch.
What benchmark are you using? I'd like to run it on my malloc.
Rich