This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Simple malloc benchtest.
- From: Siddhesh Poyarekar <siddhesh at redhat dot com>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>, GNU C Library <libc-alpha at sourceware dot org>
- Date: Tue, 31 Dec 2013 11:33:03 +0530
- Subject: Re: [PATCH] Simple malloc benchtest.
- Authentication-results: sourceware.org; auth=none
- References: <20131221153303 dot GA8420 at domone dot podge> <20131223090627 dot GF4979 at spoyarek dot pnq dot redhat dot com> <20131223095034 dot GA20816 at domone> <20131223110912 dot GG4979 at spoyarek dot pnq dot redhat dot com> <20131223133157 dot GB20816 at domone> <CAAHN_R2+N432brpObQ9RK4cJgjcqvpf+gtF7J4b1wwMShFQu7A at mail dot gmail dot com> <20131228011928 dot GB15270 at domone>
On Sat, Dec 28, 2013 at 02:19:28AM +0100, OndÅej BÃlka wrote:
> You are reinventing the wheel. What you described is measurement 1:
I didn't reinvent the wheel. I know I described measurement 1 - I
started my statement with 'IMO measurement 1, i.e. ...'.
> You need to know limits of measurements, and that they could fail badly
> in certain circumstances. In papers a 4. is common as authors know that
> other measurements are more misleading.
I did not say that measurement 4 is altogether wrong either. It may
have uses in other purposes, like usage as one metric when comparing
fragmentation of algorithms. It is however a horrible measurement
when your aim is to decide if fragmentation as a problem is relevant
at all since it will obviously give you the answer you're looking for.
> 1. Not really measuring fragmentation, when you have program that
> previously needed twice of live memory depending on allocator free
> memory could form either continuous chunk or be fragmented on many
> islands. With that amount of resources even bad allocator could likely
> satisfy all requests and when average consists mostly of these
> contributions then results will turn inconclusive.
> 2. A number you get could be chaotic depending on tiny detail. Imagine a
> program that in start does lot of small allocations, then frees everything
> but last allocation. Then it does long computation that is not memory intensive.
> In that program amount of memory used is depends only on address of last
> allocating which will vary wildly even for most innocent change of how
> chunks are ordered.
> A variant of this problem is that you have program that is mostly stable
> except of periodic peaks.
> A good allocator could fit 80% of peak usage and need to place only 20%
> into new area. A bad allocator would place 80% of peak data into new
> Bad allocator would come as winner as when freeing peak data it crosses
> a trim treshold but good allocator does not.
For both these points you talk about a 'bad allocator' without
actually defining what it means. Does it mean a slow allocator or
does it mean an allocator that is good at freeing resources in a
timely manner? It doesn't seem to be the latter if it can be tweaked
into managing resources better, so I'm assuming you mean a slow
allocator? Do you still not see why simply allocation speed is not a
good enough measurement?
> 3. Problems with stability. Unless you are measuring a long running process
> with mostly flat profile then numbers you get could vary wildly
> depending on input. Allocation pattern could stay same but if majority
> of time is spend in routine that does not allocate memory but time
> varies by input you get different percentages.
> 4. That by averaging a live memory versus used memory does not measure
> fragmentation but how effectively we return memory to system.
> You could get better numbers by simply choosing a smaller trim treshold
> and triggering trim more often.
Yes, at the cost of performance. You make an interesting point by
trying to differentiate between the act of just blocking memory and
not using it and blocking memory, not using it and asking for
additional memory from the system, the latter behaviour being the
textbook definition of fragmentation. I believe this ties in with
your point 3 as well, since a program with a flat profile where it has
blocked a big chunk of memory because of an allocated chunk on top
will show up as performing badly, when it shouldn't.
One way to counter this could be to somehow factor in events like
growth of memory in the presence of holes in memory, so what you'll be
looking at is the slope of memory usage, where a flatter profile is
better. But that's just off the top of my head, so I'm sure there
would be more factors in play.
> We currently do not return memory to system well, see bug 11261 and it
> is common that users like this one complain that we use 50% more memory
> than alternatives.
I wonder how relevant the comparison in memory usage is in that blog
post, given that the bloated 'memory usage' may actually just be
address space usage due to the per-thread arenas, where most of each
arena is unused and doesn't even have any pages backing the address
space. In other words, I believe more information is needed from that
That said, I don't disagree that we need to look at and monitor how
good or bad our fragmentation is. In fact, that is why I mentioned
the need for any malloc benchmark to keep a tab on fragmentation as
> You would get more realistic estimate by actually unmapping unused pages
> and dealing with holes, but that would change how long program spends on
> various parts which would change average.
I'm not sure I understand. I assume you're suggesting that we unmap
or mprotect(PROT_NONE) (or use some other mechanism to give back the
address space) for chunks that are larger than a threshold. That's
not a bad idea and it will reduce memory usage, but the cost would be
additional syscalls. More work needs to be done to show that the net
result is beneficial and you'll need a benchmark that does both,
measure fragmentation and speed, to prove that :)