This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |
Other format: | [Raw text] |
On Mon, Jun 09, 2014 at 10:33:26PM +0200, OndÅej BÃlka wrote: > Problem is that this benchmark does not measure a multithread > performance well. Just spawning many threads does not say much, my guess > is that locking will quicky cause convergence to state where at each > core a thread with separate arena is running. How is that a bad thing? > Also it does not measure hard case when you allocate memory in one > thread. It does that in bench-malloc. Or maybe I don't understand what you mean. > I looked on multithread benchmark and it has additional flaws: > > Big variance, running time varies around by 10% accoss iterations, > depending on how kernel schedules these. Kernel scheduling may not be the most important decider on variance. The major factor would be points at which the arena would have to be extended and then the performance of those syscalls. > Running threads and measuring time after you join them measures a > slowest thread so at end some cores are idle. How does that matter? > Bad units, when I run a benchmark then with one benchmark a mean is: > "mean": 91.605, > However when we run 32 threads then it looks that it speeds malloc > around three times: > "mean": 28.5883, Why do you think the units are bad? Mean time for allocation of a single block in a single thread being slower than that of multiple threads may have something to do with the difference between performance on the main arena vs non-main arenas. Performance difference between mprotect and brk or even their frequency or the difference in logic to extend heaps or finally, defaulting to mmap for the main arena when extension fails could be some factors. That said, it may be useful to see how each thread performs separately. For all we know, the pattern of allocation may somehow be favouring the multithreaded scenario. > No, that was a benchmark that I posted which measured exactly what > happens at given sizes. Post it again and we can discuss it? IIRC it was similar to this benchmark with random sizes, but maybe I misremember. > > However if you do want to show resource usage, then address space > > usage (VSZ) might show scary numbers due to the per-thread arenas, but > > they would be much more representative. Also, it might be useful to > > see how address space usage scales with threads, especially for > > 32-bit. > > > Still this would be worse than useless as it would vary wildly from real > behaviour (for example it is typical that when there are allocations in > quick succession then they will likely also deallocated in quick > sucession.) and that would cause us implement something that actually > increases memory usage. It would be a concern if we were measuring memory usage over time. Looking at just maximum usage does not have that problem. Siddhesh
Attachment:
pgpqPQxNFEHti.pgp
Description: PGP signature
Index Nav: | [Date Index] [Subject Index] [Author Index] [Thread Index] | |
---|---|---|
Message Nav: | [Date Prev] [Date Next] | [Thread Prev] [Thread Next] |