This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: improving malloc


On Sat, Jan 05, 2013 at 11:05:05PM -0800, Andi Kleen wrote:
> OndÅej BÃlka <neleai@seznam.cz> writes:
> >
> > It bottleneck on core2 where compare and swap takes 80 cycles. 
> > Trend is that CAS is faster on modern processors, on sandy bridge 
> > its only 30 cycles.
> 
> The 30 cycles is for the case when the cache line is in the local L1
> already.
> 
> But interesting is what happens when it is in someone else's cache.
> Under thread contention you will get many orders of magnitute
> longer latencies, especially on larger systems.

Sure, I wanted to say that even in best case its somewhat slow.
> 
> Written mallocs in 2013 that are not thread local is simply not
> acceptable anymore.
> 
> Generally writting a good threaded malloc is tricky. The case
> of allocating on one thread and freeing on another is also 
> important, so you have to have a per thread data structure that
> is still friendly to other threads 

I wrote it to be more local. I only need update variable that is 
shared only by allocations on same page and they were allocated 
by same thread(unless page is full/empty). I do not expect this 
to be contended.
> 
> Other considerations are memory fragmentation, how quickly 
> it can give back unused memory to the OS, etc. etc.
>
For giving memory back to OS when linux gets volatile ranges then
we can finally do not have to defer returning memory because zeroing
pages is expensive.
I wanted to suggest at linux-kernel to keep pages returned to linux 
at linked list and for allocations prefer these as they do not have
to be zeroed.

> Writing a good general purpose malloc is extremly hard.

Writing faster malloc than current state will suffice.
> 
> I did some experiments with tcmalloc some time ago and it can give
> speedups because it has a much faster fast path for the uncontended
> case. However tcmalloc has some issues that do not really make
> it fully general purpose, i.e. it it unable to ever give memory back
> to the OS.

Also they tend to optimize for speed even with large sizes where
optimizing fragmentation could be better.
> 
> -Andi
> -- 
> ak@linux.intel.com -- Speaking for myself only

Ondra


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]