This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] benchtests: Add malloc microbenchmark


On 15 April 2014 17:27, Rich Felker <dalias@aerifal.cx> wrote:
> On Tue, Apr 15, 2014 at 04:42:25PM +0100, Will Newton wrote:
>> On 15 April 2014 16:36, Steven Munroe <munroesj@linux.vnet.ibm.com> wrote:
>> > On Tue, 2014-04-15 at 14:35 +0100, Will Newton wrote:
>> >> Add a microbenchmark for measuring malloc and free performance. The
>> >> benchmark allocates and frees buffers of random sizes in a random
>> >> order and measures the overall execution time and RSS. Variants of the
>> >> benchmark are run with 8, 32 and 64 threads to measure the effect of
>> >> concurrency on allocator performance.
>> >>
>> >> The random block sizes used follow an inverse square distribution
>> >> which is intended to mimic the behaviour of real applications which
>> >> tend to allocate many more small blocks than large ones.
>> >>
>> >
>> > This test is more likely to measure the locking overhead of random then
>> > it is to measure malloc performance.
>>
>> It uses rand_r so I don't think this is the case.
>
> If you're using rand_r, you need to be careful how you use the output,
> as glibc's rand_r implementation has very poor statistical properties.
> See:
>
> http://sourceware.org/bugzilla/show_bug.cgi?id=15615

Thanks for the pointer, I have switched to using rand(3), although I
suspect the quality of the random numbers is probably not a very big
worry in this case.

>> > Any attempt at defining a new (another) micro-benchmark should profile
>> > to verify that overhead of setup and measurement is small (< 1%)
>> > compared to what you are trying measure.
>>
>> Well there are currently no microbenchmarks for malloc in glibc and
>> very few in the wild, even fewer with sane licenses.
>
> BTW have you looked at the one from locklessinc.com? It makes glibc
> look really bad and their allocator look very good, but I'm not
> convinced that they didn't artifically tweak it to get these results.
> If nothing else, however, it's revealing a serious bottleneck in how
> glibc does growth of non-main arenas using mprotect.

I'll have a look at that, thanks for the link.

>> The benchmark code spends roughly 80% of its time within malloc/free
>> and friends, which is good, but does leave some room for improvement.
>> Around 10% of the time is spent in dealing with random number
>> generation so maybe a simple inline random number generator would
>> improve things.
>
> What about just pregenerating a large array of random numbers and
> accessing sequentual slots of the array? This potentially has cache
> issues but it might be possible to simply use a small array and wrap
> back to the beginning, perhaps performing a trivial operation like
> adding the last output of the previous run onto the value in the
> array.

I'll use an approach like that in my next version. I now get between
90-95% of time spent within the allocator and the rest of the time in
the benchmark loop seemingly due to cache misses accessing the random
numbers and pointer array.

-- 
Will Newton
Toolchain Working Group, Linaro


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]