This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
RE: [RFC PATCH]: Align large allocations to cacheline
- From: "Wilco Dijkstra" <wdijkstr at arm dot com>
- To: "'Rich Felker'" <dalias at libc dot org>
- Cc: <libc-alpha at sourceware dot org>
- Date: Wed, 29 Oct 2014 15:31:37 -0000
- Subject: RE: [RFC PATCH]: Align large allocations to cacheline
- Authentication-results: sourceware.org; auth=none
- References: <002301cff37e$61c12330$25436990$ at com> <20141029144446 dot GS22465 at brightrain dot aerifal dot cx>
> Rich Felker wrote:
> On Wed, Oct 29, 2014 at 01:43:54PM -0000, Wilco Dijkstra wrote:
> > This patch aligns allocations of large blocks to a cacheline on ARM and AArch64. The main
> goal is to
> > reduce performance variations due to random alignment choices, however it improves
> performance on
> > several benchmarks as well. SPECFP2000 improves by ~1.5%.
> >
> > Any comments?
>
> It seems like this would pathologically increase fragmentation: when a
> tiny block is split off the beginning to align a large allocation,
> isn't it likely that this tiny chunk (rather than some other tiny
> chunk already in the appropriate-sized free bin prior to the large
> allocation) will be used to satisfy a tiny request, which may end up
> being long-lived?
A good implementation of memalign would try to avoid that, for example
by increasing the previous allocated block or not adding tiny splitoff
chunks to the free list.
> memalign-type operations are known to be a major problem for
> fragmentation. See this bug report:
>
> https://sourceware.org/bugzilla/show_bug.cgi?id=14581
It seems that report is claiming that freeing a block doesn't immediately
merge with adjacent freed blocks. If that is the case then that is a very
serious bug indeed! The fact that memalign is not fully integrated into
the allocation code doesn't help either - it never tries to reuse an
correctly aligned free block.
> A smarter approach might be to round up the _sizes_ of large
> allocations so that the next available address after the allocated
> block is nicely aligned. This will tend to make _future_ allocations
> aligned, even if the first one isn't, and won't leave any free gaps
> that could contribute to fragmentation.
That might be possible too, I'd have to give it a go. I was trying to avoid
reworking the guts of the malloc code - I think it would be better to create
a new modern allocator from scratch.
Wilco