This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: The direction of malloc?


On Wed, 2013-12-11 at 09:15 +0000, Will Newton wrote:
> On 11 December 2013 02:31, Siddhesh Poyarekar <siddhesh@redhat.com> wrote:
> > On Tue, Dec 10, 2013 at 10:05:41PM +0100, OndÅej BÃlka wrote:
> >> > * Should we provide thread cache blocks to do provide some lockless allocation?
> >>
> >> This is most low-hanging fruit that I aim for. We already use tls to
> >> determine arena so this should not be a issue.
> >>
> >> We have fastbins that sorta do this but with several problems.
> >> 1. They are not really lockless, for malloc they need a lock, only
> >> freeing will be when bug 15073 gets fixed.
> >>
> >> Second problem is that fastbins are per-arena not per-thread which
> >> forces us to use atomic operations. These are expensive (typicaly more than 50 cycles).
> >>
> >> Moving these to per-thread bins mostly just needs refactoring of current
> >> code to one that makes more sense.
> >
> > With arenas-per-thread, you essentially have contention-free access,
> > which is not the same thing as lock-free, but not much worse.  You'll
> > have lock contention in per-thread arenas only when there are more
> > threads than arenas, which in the default case means that you have
> > more threads than twice the number of cores, which is too many threads
> > anyway.
> 
> Lock contention would be worse, but still the atomic instructions
> required to lock/unlock the arena is the hottest part of the profile
> on many single-threaded malloc workloads.

That is on ARM I suppose?

> If we are going to get a new malloc or update the old one I think the
> fast path being lock-free should be a requirement.

There's "lock-free", which is a forward progress condition, essentially
stating that independently of what all other threads are doing, at least
one of the concurrent threads will finish the operation in a finite
number of steps.

There's "lock-less", used to roughly state that the code doesn't use
locks (IOW, is nonblocking).  If it's still concurrent code, it will
have to use atomic instructions though, and in many cases more of them
than if you'd grab a single lock.

Which one do you mean?  Or were you thinking about using nonconcurrent
code by using per-thread data if possible?



Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]