This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: The direction of malloc?
- From: Torvald Riegel <triegel at redhat dot com>
- To: Will Newton <will dot newton at linaro dot org>
- Cc: Siddhesh Poyarekar <siddhesh at redhat dot com>, OndÅej BÃlka <neleai at seznam dot cz>, Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Mon, 16 Dec 2013 18:03:09 +0100
- Subject: Re: The direction of malloc?
- Authentication-results: sourceware.org; auth=none
- References: <52A6A0DA dot 1080109 at redhat dot com> <CANu=Dmi32gwk-hQ3dDbj0d4_gs3FWqt02+NmveXH1p03Vm+Mfg at mail dot gmail dot com> <20131210121622 dot GA5416 at domone dot podge> <52A75502 dot 6040500 at linux dot vnet dot ibm dot com> <20131210210541 dot GA19161 at domone dot podge> <20131211023150 dot GA20835 at spoyarek dot pnq dot redhat dot com> <CANu=DmiTFu59qTgP=3Ks6_biCGvGqnis0++mmZdqX6-1FDWaUg at mail dot gmail dot com>
On Wed, 2013-12-11 at 09:15 +0000, Will Newton wrote:
> On 11 December 2013 02:31, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> > On Tue, Dec 10, 2013 at 10:05:41PM +0100, OndÅej BÃlka wrote:
> >> > * Should we provide thread cache blocks to do provide some lockless allocation?
> >>
> >> This is most low-hanging fruit that I aim for. We already use tls to
> >> determine arena so this should not be a issue.
> >>
> >> We have fastbins that sorta do this but with several problems.
> >> 1. They are not really lockless, for malloc they need a lock, only
> >> freeing will be when bug 15073 gets fixed.
> >>
> >> Second problem is that fastbins are per-arena not per-thread which
> >> forces us to use atomic operations. These are expensive (typicaly more than 50 cycles).
> >>
> >> Moving these to per-thread bins mostly just needs refactoring of current
> >> code to one that makes more sense.
> >
> > With arenas-per-thread, you essentially have contention-free access,
> > which is not the same thing as lock-free, but not much worse. You'll
> > have lock contention in per-thread arenas only when there are more
> > threads than arenas, which in the default case means that you have
> > more threads than twice the number of cores, which is too many threads
> > anyway.
>
> Lock contention would be worse, but still the atomic instructions
> required to lock/unlock the arena is the hottest part of the profile
> on many single-threaded malloc workloads.
That is on ARM I suppose?
> If we are going to get a new malloc or update the old one I think the
> fast path being lock-free should be a requirement.
There's "lock-free", which is a forward progress condition, essentially
stating that independently of what all other threads are doing, at least
one of the concurrent threads will finish the operation in a finite
number of steps.
There's "lock-less", used to roughly state that the code doesn't use
locks (IOW, is nonblocking). If it's still concurrent code, it will
have to use atomic instructions though, and in many cases more of them
than if you'd grab a single lock.
Which one do you mean? Or were you thinking about using nonconcurrent
code by using per-thread data if possible?