This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH v3] benchtests: Add malloc microbenchmark
- From: OndÅej BÃlka <neleai at seznam dot cz>
- To: Siddhesh Poyarekar <siddhesh dot poyarekar at gmail dot com>
- Cc: Will Newton <will dot newton at linaro dot org>, libc-alpha <libc-alpha at sourceware dot org>
- Date: Wed, 25 Jun 2014 15:15:19 +0200
- Subject: Re: [PATCH v3] benchtests: Add malloc microbenchmark
- Authentication-results: sourceware.org; auth=none
- References: <1403196368-26785-1-git-send-email-will dot newton at linaro dot org> <20140625092926 dot GA28367 at domone dot podge> <CANu=DmgY1ZZODXSMhnM4ajNQzv3YJSOH_6EgCbcXtnoymPRt7g at mail dot gmail dot com> <CAAHN_R3=pxjo8Hx5YGN=QYAxF8p9KRrXUqUNWG7x1KVAuq6hKA at mail dot gmail dot com>
On Wed, Jun 25, 2014 at 03:21:51PM +0530, Siddhesh Poyarekar wrote:
> On 25 June 2014 15:09, Will Newton <firstname.lastname@example.org> wrote:
> > I expected to potentially see two inflection points in the curve. One
> > due to the single thread optimization in glibc that will make the
> > single threaded case disproportionally faster. I also expected to see
> > some kind of indication that I had run out of free CPU cores (and thus
> > context switch overhead increases). I ran the test on a 4 core i5
> > (hyper-threaded). I believe that's what is visible here:
> There should be a third inflection point for glibc malloc at 8 *
> number of cores, where it stops allocating arenas per thread and you
> have contention for locks in addition to contention for CPU. That's
> not visible in this graph because on a 4 core machine glibc malloc can
> go up to 32 threads without sharing arenas.
No, this is simplistic benchmark, it does not measure thread contention,
as it does not do anything that could trigger it.
Here as it uses long running threads with static dependencies a conflicts will
cause quick convergence to state where on each core runs a process until
it times out. A vmstat shows that there are around 2600 context switches
per second regardless what if benchmark is running or not.
To measure a multithread performance you would need to create some
multithread workload. For example try a spawning a short lived threads
that will do hundred allocations and frees then quit that could hit some
problems. Then focus in multithread bottlenecks, one is when you free
memory in another thread than where you allocated it.