This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH v3] benchtests: Add malloc microbenchmark
- From: Will Newton <will dot newton at linaro dot org>
- To: OndÅej BÃlka <neleai at seznam dot cz>
- Cc: libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 25 Jun 2014 10:39:24 +0100
- Subject: Re: [PATCH v3] benchtests: Add malloc microbenchmark
- Authentication-results: sourceware.org; auth=none
- References: <1403196368-26785-1-git-send-email-will dot newton at linaro dot org> <20140625092926 dot GA28367 at domone dot podge>
On 25 June 2014 10:29, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Thu, Jun 19, 2014 at 05:46:08PM +0100, Will Newton wrote:
>> Add a microbenchmark for measuring malloc and free performance with
>> varying numbers of threads. The benchmark allocates and frees buffers
>> of random sizes in a random order and measures the overall execution
>> time and RSS. Variants of the benchmark are run with 1, 4, 8 and
>> 16 threads.
>>
>> The random block sizes used follow an inverse square distribution
>> which is intended to mimic the behaviour of real applications which
>> tend to allocate many more small blocks than large ones.
>>
>> ChangeLog:
>>
>> 2014-06-19 Will Newton <will.newton@linaro.org>
>>
>> * benchtests/Makefile: (bench-malloc): Add malloc thread
>> scalability benchmark.
>> * benchtests/bench-malloc-threads.c: New file.
>> ---
>> benchtests/Makefile | 20 ++-
>> benchtests/bench-malloc-thread.c | 299 +++++++++++++++++++++++++++++++++++++++
>> 2 files changed, 316 insertions(+), 3 deletions(-)
>> create mode 100644 benchtests/bench-malloc-thread.c
>>
>> Changes in v3:
>> - Single executable that takes a parameter for thread count
>> - Run for a fixed duration rather than a fixed number of loops
>> - Other fixes in response to review suggestions
>>
>> Example of a plot of the results versus tcmalloc and jemalloc on
>> a 4 core i5:
>>
>> http://people.linaro.org/~will.newton/bench-malloc-threads.png
>>
> That graph looks interesting. It is little weird that in libc a 2 and
> three thread take nearly same time but not when you use four thread one.
>
> For other allocators a dependency is linear. How could you explain that?
I expected to potentially see two inflection points in the curve. One
due to the single thread optimization in glibc that will make the
single threaded case disproportionally faster. I also expected to see
some kind of indication that I had run out of free CPU cores (and thus
context switch overhead increases). I ran the test on a 4 core i5
(hyper-threaded). I believe that's what is visible here:
1. Single threaded disproportionally faster
2. Curve gradient is lower from 1 -> number of cores (and this seems
to be visible in at least tcmalloc as well)
3. Curve gradient increases and remains roughly constant above number of cores
--
Will Newton
Toolchain Working Group, Linaro