This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: The direction of malloc?

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Torvald Riegel <triegel at redhat dot com>
Cc: Adhemerval Zanella <azanella at linux dot vnet dot ibm dot com>, libc-alpha at sourceware dot org
Date: Mon, 16 Dec 2013 22:23:34 +0100
Subject: Re: The direction of malloc?
Authentication-results: sourceware.org; auth=none
References: <52A6A0DA dot 1080109 at redhat dot com> <CANu=Dmi32gwk-hQ3dDbj0d4_gs3FWqt02+NmveXH1p03Vm+Mfg at mail dot gmail dot com> <20131210121622 dot GA5416 at domone dot podge> <52A75502 dot 6040500 at linux dot vnet dot ibm dot com> <20131210210541 dot GA19161 at domone dot podge> <1387213140 dot 23049 dot 8010 dot camel at triegel dot csb>

On Mon, Dec 16, 2013 at 05:59:00PM +0100, Torvald Riegel wrote:
> On Tue, 2013-12-10 at 22:05 +0100, OndÅej BÃlka wrote:
> > We have fastbins that sorta do this but with several problems. 
> > 1. They are not really lockless, for malloc they need a lock, only
> > freeing will be when bug 15073 gets fixed.
> 
> Note that *properly tuned* locks can be as fast as lock-less code in
> several cases.  When you use a, say, stock PTHREAD_MUTEX_NORMAL lock,
> this won't be well-suited to short critical sections because it doesn't
> spin.  PTHREAD_MUTEX_ADAPTIVE_NP spins, but the number of spinning
> iterations seems to be an arbitrary choice, and isn't very useful in my
> experiments so far.  Also, there's no back-off whatsoever in the
> spinning.
>
As will said acquiring a lock is hot part in a single thread
applications.

Please explain how spinning could improve performance in single thread
applications.

> 
> > Second problem is that fastbins are per-arena not per-thread which
> > forces us to use atomic operations. These are expensive (typicaly more than 50 cycles).
> 
> Especially on x86, atomic operations that *hit in the cache* have become
> very fast compared to their costs in the past.  I don't have current
> numbers, but I believe the 50 cycle number is too high; I vaguely
> remember 10-20.

A simple benchmark could check a real cost. A problem is that while for core2
and i7 cost of CAS is around 20 cycles for bulldozer its still 50 cycles.

Even with these a malloc+free pair contains 5 atomic instructions on
fastbin path which gives 100 cycle penalty (malloc: lock, get item from bin, unlock,
free: set bit that fastbins are in use, put item to bin)

Follow-Ups:
- benchmark for atomic.
  - From: OndÅej BÃlka
- Re: The direction of malloc?
  - From: Torvald Riegel

References:
- The direction of malloc?
  - From: Carlos O'Donell
- Re: The direction of malloc?
  - From: Will Newton
- Re: The direction of malloc?
  - From: OndÅej BÃlka
- Re: The direction of malloc?
  - From: Adhemerval Zanella
- Re: The direction of malloc?
  - From: OndÅej BÃlka
- Re: The direction of malloc?
  - From: Torvald Riegel

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]