This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH v2] benchtests: Add malloc microbenchmark

On Mon, Jun 09, 2014 at 10:33:26PM +0200, OndÅej BÃlka wrote:
> Problem is that this benchmark does not measure a multithread
> performance well. Just spawning many threads does not say much, my guess
> is that locking will quicky cause convergence to state where at each
> core a thread with separate arena is running.

How is that a bad thing?

> Also it does not measure hard case when you allocate memory in one
> thread.

It does that in bench-malloc.  Or maybe I don't understand what you

> I looked on multithread benchmark and it has additional flaws:
> Big variance, running time varies around by 10% accoss iterations,
> depending on how kernel schedules these. 

Kernel scheduling may not be the most important decider on variance.
The major factor would be points at which the arena would have to be
extended and then the performance of those syscalls.

> Running threads and measuring time after you join them measures a
> slowest thread so at end some cores are idle.

How does that matter?

> Bad units, when I run a benchmark then with one benchmark a mean is:
> "mean": 91.605,
> However when we run 32 threads then it looks that it speeds malloc
> around three times:
>  "mean": 28.5883,

Why do you think the units are bad?  Mean time for allocation of a
single block in a single thread being slower than that of multiple
threads may have something to do with the difference between
performance on the main arena vs non-main arenas.  Performance
difference between mprotect and brk or even their frequency or the
difference in logic to extend heaps or finally, defaulting to mmap for
the main arena when extension fails could be some factors.

That said, it may be useful to see how each thread performs
separately.  For all we know, the pattern of allocation may somehow be
favouring the multithreaded scenario.

> No, that was a benchmark that I posted which measured exactly what
> happens at given sizes.

Post it again and we can discuss it?  IIRC it was similar to this
benchmark with random sizes, but maybe I misremember.

> > However if you do want to show resource usage, then address space
> > usage (VSZ) might show scary numbers due to the per-thread arenas, but
> > they would be much more representative.  Also, it might be useful to
> > see how address space usage scales with threads, especially for
> > 32-bit.
> >
> Still this would be worse than useless as it would vary wildly from real
> behaviour (for example it is typical that when there are allocations in
> quick succession then they will likely also deallocated in quick
> sucession.)  and that would cause us implement something that actually
> increases memory usage.

It would be a concern if we were measuring memory usage over time.
Looking at just maximum usage does not have that problem.


Attachment: pgpqPQxNFEHti.pgp
Description: PGP signature

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]