This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v3] benchtests: Add malloc microbenchmark

On 25 June 2014 10:29, OndÅej BÃlka <> wrote:
> On Thu, Jun 19, 2014 at 05:46:08PM +0100, Will Newton wrote:
>> Add a microbenchmark for measuring malloc and free performance with
>> varying numbers of threads. The benchmark allocates and frees buffers
>> of random sizes in a random order and measures the overall execution
>> time and RSS. Variants of the benchmark are run with 1, 4, 8 and
>> 16 threads.
>> The random block sizes used follow an inverse square distribution
>> which is intended to mimic the behaviour of real applications which
>> tend to allocate many more small blocks than large ones.
>> ChangeLog:
>> 2014-06-19  Will Newton  <>
>>       * benchtests/Makefile: (bench-malloc): Add malloc thread
>>       scalability benchmark.
>>       * benchtests/bench-malloc-threads.c: New file.
>> ---
>>  benchtests/Makefile              |  20 ++-
>>  benchtests/bench-malloc-thread.c | 299 +++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 316 insertions(+), 3 deletions(-)
>>  create mode 100644 benchtests/bench-malloc-thread.c
>> Changes in v3:
>>  - Single executable that takes a parameter for thread count
>>  - Run for a fixed duration rather than a fixed number of loops
>>  - Other fixes in response to review suggestions
>> Example of a plot of the results versus tcmalloc and jemalloc on
>> a 4 core i5:
> That graph looks interesting. It is little weird that in libc a 2 and
> three thread take nearly same time but not when you use four thread one.
> For other allocators a dependency is linear. How could you explain that?

I expected to potentially see two inflection points in the curve. One
due to the single thread optimization in glibc that will make the
single threaded case disproportionally faster. I also expected to see
some kind of indication that I had run out of free CPU cores (and thus
context switch overhead increases). I ran the test on a 4 core i5
(hyper-threaded). I believe that's what is visible here:

1. Single threaded disproportionally faster
2. Curve gradient is lower from 1 -> number of cores (and this seems
to be visible in at least tcmalloc as well)
3. Curve gradient increases and remains roughly constant above number of cores

Will Newton
Toolchain Working Group, Linaro

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]