On Wed, Feb 11, 2015 at 12:01:08PM +0100, Torvald Riegel wrote:
If your machine has just two cores, then at the very least you should
measure for just two threads too; a bigger number of threads is not
putting more contention on any of the synchronization bits, there's just
some more likelihood to having to wait for a thread that isn't running.
Also, to really assess performance, this has to be benchmarked on a
machine with more cores. Additionally, you could argue why it should
not make a difference, and if that's a compelling argument, we could
follow it instead of the benchmark (which, as Will mentions, is hard to
make representative of real-world workloads).
The default malloc implementation creates 8 * n arenas on a system
with n cores, so for anything up to 8 * n threads, you're just
measuring contention between threads for the CPU since they're all
working on different arenas.
Maybe one way to guarantee such contention is a test with one thread
that allocates on an arena and another thread that frees from the same
arena. I don't think the current benchmark does that.