This is the mail archive of the
mailing list for the glibc project.
Re: improving malloc
On Sat, Jan 05, 2013 at 11:05:05PM -0800, Andi Kleen wrote:
> OndÅej BÃlka <email@example.com> writes:
> > It bottleneck on core2 where compare and swap takes 80 cycles.
> > Trend is that CAS is faster on modern processors, on sandy bridge
> > its only 30 cycles.
> The 30 cycles is for the case when the cache line is in the local L1
> But interesting is what happens when it is in someone else's cache.
> Under thread contention you will get many orders of magnitute
> longer latencies, especially on larger systems.
Sure, I wanted to say that even in best case its somewhat slow.
> Written mallocs in 2013 that are not thread local is simply not
> acceptable anymore.
> Generally writting a good threaded malloc is tricky. The case
> of allocating on one thread and freeing on another is also
> important, so you have to have a per thread data structure that
> is still friendly to other threads
I wrote it to be more local. I only need update variable that is
shared only by allocations on same page and they were allocated
by same thread(unless page is full/empty). I do not expect this
to be contended.
> Other considerations are memory fragmentation, how quickly
> it can give back unused memory to the OS, etc. etc.
For giving memory back to OS when linux gets volatile ranges then
we can finally do not have to defer returning memory because zeroing
pages is expensive.
I wanted to suggest at linux-kernel to keep pages returned to linux
at linked list and for allocations prefer these as they do not have
to be zeroed.
> Writing a good general purpose malloc is extremly hard.
Writing faster malloc than current state will suffice.
> I did some experiments with tcmalloc some time ago and it can give
> speedups because it has a much faster fast path for the uncontended
> case. However tcmalloc has some issues that do not really make
> it fully general purpose, i.e. it it unable to ever give memory back
> to the OS.
Also they tend to optimize for speed even with large sizes where
optimizing fragmentation could be better.
> firstname.lastname@example.org -- Speaking for myself only