This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: malloc: performance improvements and bugfixes
- From: Jörn Engel <joern at purestorage dot com>
- To: Torvald Riegel <triegel at redhat dot com>
- Cc: "GNU C. Library" <libc-alpha at sourceware dot org>, Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, Joern Engel <joern at purestorage dot org>
- Date: Tue, 26 Jan 2016 09:59:40 -0800
- Subject: Re: malloc: performance improvements and bugfixes
- Authentication-results: sourceware.org; auth=none
- References: <1453767872-19161-1-git-send-email-joern at purestorage dot com> <1453810961 dot 4592 dot 100 dot camel at localhost dot localdomain> <20160126171435 dot GG5745 at Sligo dot logfs dot org> <1453828825 dot 4592 dot 108 dot camel at localhost dot localdomain> <20160126172629 dot GH5745 at Sligo dot logfs dot org> <1453829759 dot 4592 dot 116 dot camel at localhost dot localdomain>
On Tue, Jan 26, 2016 at 06:35:59PM +0100, Torvald Riegel wrote:
> On Tue, 2016-01-26 at 09:26 -0800, Jörn Engel wrote:
> > On Tue, Jan 26, 2016 at 06:20:25PM +0100, Torvald Riegel wrote:
> > >
> > > How do the allocation patterns look like? There can be big variations
> > > in allocation frequency and size, lifetime of allocated regions,
> > > relation between allocations and locality, etc. Some programs allocate
> > > most up-front, others have lots of alloc/dealloc during the lifetime of
> > > the program.
> >
> > Lots of alloc/dealloc during the lifetime. To give you a rough scale,
> > malloc consumed around 1.7% cputime in the stable state. Now it is down
> > to about 0.7%.
>
> Eventually, I think we'd like to get more detail on this, so that we
> start tracking performance regressions too and that the model of
> workloads we have is less hand-wavy than "big application". Given that
> malloc will remain a general-purpose allocator (at least in the default
> config / tuning), we'll have to choose trade-off so that they represent
> workloads, for which we'll have to classify workloads in some way.
The workload itself is closed source, so the only ones ever testing with
that workload is likely us. Hence my handwaving. Having some other
application available to the general public for testing would be nice,
though.
I suspect there isn't even a shortage of applications to choose from.
Firefox uses jemalloc. If you can automate some runs in firefox and
compare jemalloc to libc malloc, you will likely find the same problems
we encountered. And from what I heard there is no shortage of open
source applications that switched over to jemalloc or tcmalloc and could
be used as well.
Cheap NUMA systems are below $5k, so not completely outragous. QEMU
probably cannot be used to simulate the performance effects, so you need
hardware.
Jörn
--
A victorious army first wins and then seeks battle.
-- Sun Tzu