This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Thu, Mar 26, 2015 at 02:42:45PM +0100, Leonhard Holz wrote: > Hmm.. the malloc/free code has different pathes depending on requested block > size including different locking procedures, so I would like to keep some > kind of size variance. Yes, the primary different paths are: 1. fastbin allocation 2. regular allocation on the heap 3. mmap We're only interested in the first two since mmap doesn't need any locks. In fact, the allocation sizes chosen will never even result in an mmap, so we don't even have to think about it. > Also the current implementation is not that different > to your proposed schema, actually all threads act as producers and consumers > that are coupled via the round-robin block array which acts as some sort of > queue. And a queue to connect the threads is needed anyway (?), either with > extra data structures (which implies malloc which implies locking) or > explicit locking as implemented. It is similar, but the crucial difference is that you don't know for sure how much the threads contend for arenas. You partition the working set to give each thread a separate set of indices and sizes. The indices may overlap (and hence cause the contention that we want to see) but how much they overlap is not clear from just reading the test. > So I would'nt say that the proposed version is not improvable but please be > a bit more specific about what should be achieved and how the proposed way > does this. My suggestion was to make threads either producers or consumers, not both. It was important for the earlier benchmark to mix up allocation sizes as well as locations because it tested performance of the algorithm in general and not lock contention. In this case, your primary goal would be to measure the performance impact of contention for the arena lock, so mixing up allocations and sizes don't add a lot value, it only complicates the test. An easier way would be to spawn a producer consumer pair where one allocates and the other frees. The next case would be one producer and multiple consumers and the final case would be multiple producers and one consumer. Repeat for larger sets of such threads, i.e. 4 producers and consumers, 2 sets of 1 producer and 4 consumers, and so on. In all cases, take timing values only around a single malloc or free and nothing else. Siddhesh
Attachment:
pgprQVPJty77t.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |