This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v2] Add malloc micro benchmark
- From: Ondřej Bílka <neleai at seznam dot cz>
- To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>
- Cc: Florian Weimer <fweimer at redhat dot com>, Carlos O'Donell <carlos at redhat dot com>, Joseph Myers <joseph at codesourcery dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Wed, 28 Feb 2018 20:56:45 +0100
- Subject: Re: [PATCH v2] Add malloc micro benchmark
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org> <DB6PR0801MB2053641333453CE91496266E83190@DB6PR0801MB2053.eurprd08.prod.outlook.com> <email@example.com> <alpine.DEB.firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <20180228141126.GA13073@domone> <email@example.com> <20180228164621.GA10445@domone> <DB6PR0801MB20537F726B53D8373CBD6BD083C70@DB6PR0801MB2053.eurprd08.prod.outlook.com>
On Wed, Feb 28, 2018 at 05:01:57PM +0000, Wilco Dijkstra wrote:
> Ondřej Bílka wrote:
> >> I think a heap-style allocator which does not segregate allocations
> >> of different sizes still has its place, and why not provide one in
> >> glibc?
> > That isn't case for any allocator and it is asking for trouble. You want
> > to avoid sitation where two big chunks couldn't be merged because of
> > tiny chunk between them.
> Agreed, you always want to special case small blocks. I don't believe there is
> any advantage in using a single big heap.
> > For larger size this representation is still problematic and you could
> > improve performance with another representation that also avoids
> > alignment problem by placing metadata elsewhere(for mine only 4 bytes are needed).
> Larger sizes would be helped a lot once small blocks are dealt with separately.
> So I don't think we need complicated balanced binary trees when dealing with a
> small number of large blocks. You won't need an unsorted list either, large blocks
> can be merged in O(1) time.
> There may be an advantage to place meta data elsewhere, for example it could make
> adding/removing/walking free lists much faster (spatial locality) or to make heap
> overflow attacks almost impossible,
I will answer now what I plan for larger blocks,
I have new data structure mostly in head so I won't put concrete example.
First for small sizes allocation would be just poping element from
thread local single linked list, or calling function to refill lists
with enough elements when empty. I plan to add inline version to make performance
of constant small allocations same as memory pool. By using pointer to
list refill could do best-fit by making multiple buckets point to same
This is pretty generic interface, question is for which sizes it should
For larger I could do best fit in O(1) with merging on free.
It needs a condition like that we are rounding up size/alignment
to multiple of 32 for 256-2048 range and 256 for 2048-16384 as example.
Data structure would be 64 double-linked lists and 64bit integer where
i-th bit says if i-th list is nonempty. Last bucket could be special to
hold larger elements.
Finally for larger allocations I would use page-based logic as
mmaping/remapping/unmapping is about only way to actually decrease
memory footprint, I didn't try that much yet.
Code for allocation would be something like this
if (size < 256)
bucket = (size + 15) / 16;
return small_list_pop (small_list[bucket]);
else if (size < 32 * 64)
bucket = (size + 31) / 32
uint64_t t = bitmap & ((-1) << bucket);
bucket = __builtin_ctzl (t);
bucket = allocate (size);
else if (size < 256 * 64)
bucket = (size + 255) / 256;
/* ditto with bigger buckets */
/* mmap */
As free for small sizes I didn't decided yet how reclaim that to cache.
For inlining it could be something simple like create single linked list of 32 elements,
then call mass free for that list.
For medium elements it would first determine free areas before and after
free chunk, remove them from their double linked lists and unset bit if
necessary. Then sum these sizes and put it into appropriate bucket.