This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v2] Add malloc micro benchmark
- From: Florian Weimer <fweimer at redhat dot com>
- To: Carlos O'Donell <carlos at redhat dot com>, Joseph Myers <joseph at codesourcery dot com>
- Cc: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>, nd <nd at arm dot com>
- Date: Wed, 28 Feb 2018 13:40:28 +0100
- Subject: Re: [PATCH v2] Add malloc micro benchmark
- Authentication-results: sourceware.org; auth=none
- References: <DB6PR0801MB205334E9F59632F30A17202683390@DB6PR0801MB2053.eurprd08.prod.outlook.com> <firstname.lastname@example.org> <DB6PR0801MB20538A414F1659748E703918833C0@DB6PR0801MB2053.eurprd08.prod.outlook.com> <email@example.com> <DB6PR0801MB2053641333453CE91496266E83190@DB6PR0801MB2053.eurprd08.prod.outlook.com> <firstname.lastname@example.org> <alpine.DEB.email@example.com> <firstname.lastname@example.org>
On 01/05/2018 06:26 PM, Carlos O'Donell wrote:
It would still be a win if we did not have co-located metadata (something
Florian whispered into my ear years ago now) for small constant sized blocks.
We would go from this:
N * 1-byte allocations => N * (32-byte header
+ 1-byte allocation
+ 15-bytes alignment)
[97% constant waste]
Actually, we have an 8-byte header, a 16-byte alignment requirement, and
a 24-byte minimum allocation size. (i386: 4-byte header, 16-byte
alignment, 12-byte minimum size.)
The non-main arenas actually need only a four-byte chunk header (or even
fewer bits) because there, the chunk size is very limited.
The alignment is non-negotiable, considering our current stance
regarding fundamental alignment, even for smaller allocations.
If we introduce heap layout for all arenas (which needs some way to
compensate for the variable sbrk offset, preferably without wasting a
gap there), then we should bring down the minimum allocation size to 12
bytes on 64-bit as well because for the smallest bin, we can use 4-byte
forward/backward links within a heap, and keep 8-byte pointers separate
for each heap.
None of this needs algorithm changes, but it's still difficult to
identify all the places which need changing, due to the code duplication
and some other issues with the code.
But it only helps with oddly-sized allocations (such as 12 bytes or 28
bytes), so it is unclear whether this work is worthwhile.
N * 1-byte allocations => N * (1-byte allocation
+ 15-bytes alignment)
+ (N/8)-bytes in-use-bit + 16-bytes header
[96% waste for 1-byte]
[94% waste for 100*1-byte]
... towards a 93.75% constant waste (limit of the alignment e.g. 15/16)
Another significant win is for allocation sizes such as 32, where we
currently need to allocate 48 bytes. In fact, powers-of-two are quite
bad for the current allocator.
However, it is difficult to combine this with the existing allocator
because you need to perform some table lookup during free to discover
whether the pointer has adjacent metadata or not, or waste some address
space and use bits inside the address for that.
If we want to bring the existing allocator further along, we should
perhaps try to port dlmalloc changes which get rid of the unsorted bins,
using balanced binary trees. I have a feeling that this would allow us
to consolidate far more often, and this should help us to avoid some of
the anomalies we have seen.