This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH] Simple malloc benchtest.
- From: Will Newton <will dot newton at linaro dot org>
- To: "Frank Ch. Eigler" <fche at redhat dot com>
- Cc: Ondřej Bílka <neleai at seznam dot cz>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Sun, 22 Dec 2013 14:06:57 +0000
- Subject: Re: [PATCH] Simple malloc benchtest.
- Authentication-results: sourceware.org; auth=none
- References: <20131221153303 dot GA8420 at domone dot podge> <CANu=DmiknJbnr0donGEEyG__o1Unpd7iEjeQ1LSEpv_vJNO1TA at mail dot gmail dot com> <y0mvbyhdgg7 dot fsf at fche dot csb>
On 21 December 2013 21:47, Frank Ch. Eigler <email@example.com> wrote:
> Will Newton <firstname.lastname@example.org> writes:
>> [...] It looks like you are using uniformly distributed random
>> numbers for allocation sizes. This doesn't necessarily bear any
>> relation to what actual allocation sizes are used in a real
>> application [...] This means we run through doing a long stream of
>> malloc and then a long stream of free. That is again not close to
>> application behaviour so I would recommend we interleave malloc and
>> free calls in order to introduce some stress on the allocator.
> Instead of making up ad-hoc microbenchmarks, how about tracing the
> malloc/free traffic of a real application or a dozen, and using an
> amalgam of such large traces to drive the measurements? (Insert
> cache-invalidation between operations as indicated by e.g. cachegrind
> or timestamps.)
That is certainly something we could do. However the problem with that
approach is that you need to ensure that you have up to date traces
for modern applications (e.g. many benchmarks used in papers even to
day use quite obsolete software), that the traces are sufficiently
different (i.e. do not show any bias to particular types of
applications or specific applications - "you benchmark against MySQL
but not Postgres, no fair") and even then you are discounting locality
and some cache effects because you only run the allocator not the
application so its not actually a complete panacea (you may have made
the allocation run fast but it may actually slow down the
application). On top of that you have the problem that some complex
benchmarks will make it difficult to see the effect of a change e.g.
if you accidentally slow down large allocations, the overall benchmark
will get slower but it's not obvious why.
So in summary there is a lot of work and bikeshedding still to be had
in that discussion and we badly need some kind of reasonably simple
benchmarks today that people can use.
Toolchain Working Group, Linaro