This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH v2] add concurrent malloc benchmark

From: Leonhard Holz <leonhard dot holz at web dot de>
To: Siddhesh Poyarekar <siddhesh at redhat dot com>
Cc: libc-alpha at sourceware dot org
Date: Thu, 26 Mar 2015 21:17:19 +0100
Subject: Re: [PATCH v2] add concurrent malloc benchmark
Authentication-results: sourceware.org; auth=none
References: <54FD6FAE dot 8050204 at web dot de> <20150325070034 dot GD5023 at spoyarek dot pnq dot redhat dot com> <55140CD5 dot 2090606 at web dot de> <20150326161815 dot GB1736 at spoyarek dot pnq dot redhat dot com>

Am 26.03.2015 um 17:18 schrieb Siddhesh Poyarekar:

On Thu, Mar 26, 2015 at 02:42:45PM +0100, Leonhard Holz wrote:

Hmm.. the malloc/free code has different pathes depending on requested block
size including different locking procedures, so I would like to keep some
kind of size variance.


Yes, the primary different paths are:

1. fastbin allocation
2. regular allocation on the heap
3. mmap

We're only interested in the first two since mmap doesn't need any
locks.  In fact, the allocation sizes chosen will never even result in
an mmap, so we don't even have to think about it.

Also the current implementation is not that different
to your proposed schema, actually all threads act as producers and consumers
that are coupled via the round-robin block array which acts as some sort of
queue. And a queue to connect the threads is needed anyway (?), either with
extra data structures (which implies malloc which implies locking) or
explicit locking as implemented.


It is similar, but the crucial difference is that you don't know for
sure how much the threads contend for arenas.  You partition the
working set to give each thread a separate set of indices and sizes.
The indices may overlap (and hence cause the contention that we want
to see) but how much they overlap is not clear from just reading the test.

So I would'nt say that the proposed version is not improvable but please be
a bit more specific about what should be achieved and how the proposed way
does this.


My suggestion was to make threads either producers or consumers, not
both.  It was important for the earlier benchmark to mix up allocation
sizes as well as locations because it tested performance of the
algorithm in general and not lock contention.  In this case, your
primary goal would be to measure the performance impact of contention
for the arena lock, so mixing up allocations and sizes don't add a lot
value, it only complicates the test.

An easier way would be to spawn a producer consumer pair where one
allocates and the other frees.  The next case would be one producer
and multiple consumers and the final case would be multiple producers
and one consumer.  Repeat for larger sets of such threads, i.e. 4
producers and consumers, 2 sets of 1 producer and 4 consumers, and so
on.  In all cases, take timing values only around a single malloc or
free and nothing else.

Siddhesh


Thank you.

References:
- [PATCH v2] add concurrent malloc benchmark
  - From: Leonhard Holz
- Re: [PATCH v2] add concurrent malloc benchmark
  - From: Siddhesh Poyarekar
- Re: [PATCH v2] add concurrent malloc benchmark
  - From: Leonhard Holz
- Re: [PATCH v2] add concurrent malloc benchmark
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]