This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: improving malloc

From: OndÅej BÃlka <neleai at seznam dot cz>
To: Andi Kleen <andi at firstfloor dot org>
Cc: libc-alpha at sourceware dot org
Date: Sun, 6 Jan 2013 10:56:18 +0100
Subject: Re: improving malloc
References: <20130105090242.GA4490@domone.kolej.mff.cuni.cz><m238yedc66.fsf@firstfloor.org>

On Sat, Jan 05, 2013 at 11:05:05PM -0800, Andi Kleen wrote:
> OndÅej BÃlka <neleai@seznam.cz> writes:
> >
> > It bottleneck on core2 where compare and swap takes 80 cycles. 
> > Trend is that CAS is faster on modern processors, on sandy bridge 
> > its only 30 cycles.
> 
> The 30 cycles is for the case when the cache line is in the local L1
> already.
> 
> But interesting is what happens when it is in someone else's cache.
> Under thread contention you will get many orders of magnitute
> longer latencies, especially on larger systems.

Sure, I wanted to say that even in best case its somewhat slow.
> 
> Written mallocs in 2013 that are not thread local is simply not
> acceptable anymore.
> 
> Generally writting a good threaded malloc is tricky. The case
> of allocating on one thread and freeing on another is also 
> important, so you have to have a per thread data structure that
> is still friendly to other threads 

I wrote it to be more local. I only need update variable that is 
shared only by allocations on same page and they were allocated 
by same thread(unless page is full/empty). I do not expect this 
to be contended.
> 
> Other considerations are memory fragmentation, how quickly 
> it can give back unused memory to the OS, etc. etc.
>
For giving memory back to OS when linux gets volatile ranges then
we can finally do not have to defer returning memory because zeroing
pages is expensive.
I wanted to suggest at linux-kernel to keep pages returned to linux 
at linked list and for allocations prefer these as they do not have
to be zeroed.

> Writing a good general purpose malloc is extremly hard.

Writing faster malloc than current state will suffice.
> 
> I did some experiments with tcmalloc some time ago and it can give
> speedups because it has a much faster fast path for the uncontended
> case. However tcmalloc has some issues that do not really make
> it fully general purpose, i.e. it it unable to ever give memory back
> to the OS.

Also they tend to optimize for speed even with large sizes where
optimizing fragmentation could be better.
> 
> -Andi
> -- 
> ak@linux.intel.com -- Speaking for myself only

Ondra

Follow-Ups:
- Re: improving malloc
  - From: Rich Felker

References:
- improving malloc
  - From: OndÅej BÃlka
- Re: improving malloc
  - From: Andi Kleen

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]