This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: [PATCH] Simple malloc benchtest.

From: Will Newton <will dot newton at linaro dot org>
To: Ondřej Bílka <neleai at seznam dot cz>
Cc: libc-alpha <libc-alpha at sourceware dot org>
Date: Sun, 22 Dec 2013 14:28:25 +0000
Subject: Re: [PATCH] Simple malloc benchtest.
Authentication-results: sourceware.org; auth=none
References: <20131221153303 dot GA8420 at domone dot podge> <CANu=DmiknJbnr0donGEEyG__o1Unpd7iEjeQ1LSEpv_vJNO1TA at mail dot gmail dot com> <20131221214545 dot GA5621 at domone>

On 21 December 2013 21:45, OndÅej BÃlka <neleai@seznam.cz> wrote:
> On Sat, Dec 21, 2013 at 03:57:41PM +0000, Will Newton wrote:
>> On 21 December 2013 15:33, OndÅej BÃlka <neleai@seznam.cz> wrote:
>> > Hi,
>> >
>> > Here is expansion a test that I wrote earlier for measuring malloc
>> > running time.
>> >
>> > It should be used mostly for microoptimizations that does not change how
>> > allocations are made, this rules out effect for other factors.
>> >
>> >
>> >         * benchtests/bench-malloc-small.c: New file.
>> >         * benchtests/Makefile: Add malloc-small
>> >
>> > ---
>> >  benchtests/Makefile             |    6 +++-
>> >  benchtests/bench-malloc-small.c |   69 +++++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 74 insertions(+), 1 deletion(-)
>> >  create mode 100644 benchtests/bench-malloc-small.c
>> >
>> > diff --git a/benchtests/Makefile b/benchtests/Makefile
>> > index 117228b..430a708 100644
>> > --- a/benchtests/Makefile
>> > +++ b/benchtests/Makefile
>> > @@ -31,9 +31,13 @@ string-bench := bcopy bzero memccpy memchr memcmp memcpy memmem memmove \
>> >                 strspn strstr strcpy_chk stpcpy_chk memrchr strsep strtok
>> >  string-bench-all := $(string-bench)
>> >
>> > +malloc-bench := malloc-small
>> > +
>> >  stdlib-bench := strtod
>> >
>> > -benchset := $(string-bench-all) $(stdlib-bench)
>> > +
>> > +
>> > +benchset := $(malloc-bench) $(string-bench-all) $(stdlib-bench)
>> >
>> >  LDLIBS-bench-acos = -lm
>> >  LDLIBS-bench-acosh = -lm
>> > diff --git a/benchtests/bench-malloc-small.c b/benchtests/bench-malloc-small.c
>> > new file mode 100644
>> > index 0000000..a62a9f6
>> > --- /dev/null
>> > +++ b/benchtests/bench-malloc-small.c
>> > @@ -0,0 +1,69 @@
>> > +/* Measure malloc and free running time.
>> > +   Copyright (C) 2013 Free Software Foundation, Inc.
>> > +   This file is part of the GNU C Library.
>> > +
>> > +   The GNU C Library is free software; you can redistribute it and/or
>> > +   modify it under the terms of the GNU Lesser General Public
>> > +   License as published by the Free Software Foundation; either
>> > +   version 2.1 of the License, or (at your option) any later version.
>> > +
>> > +   The GNU C Library is distributed in the hope that it will be useful,
>> > +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> > +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
>> > +   Lesser General Public License for more details.
>> > +
>> > +   You should have received a copy of the GNU Lesser General Public
>> > +   License along with the GNU C Library; if not, see
>> > +   <http://www.gnu.org/licenses/>.  */
>> > +
>> > +#define TEST_MAIN
>> > +#define TEST_NAME "malloc"
>>
>> Should be "malloc-small"?
>>
>> > +#include "bench-string.h"
>> > +
>> > +void
>> > +do_test (size_t size)
>>
>> Would this be better called min_size or similar?
>>
>> > +{
>> > +  timing_t start, stop, cur;
>> > +  const int iters = 1<<18;
>> > +
>> > +  unsigned int r_seed = 42;
>>
>> Constants might be usefully lifted out to the top level and made static const.
>>
>> > +  void *ary[iters];
>> > +  size_t sizes[iters];
>> > +  size_t idx[iters];
>> > +
>> > +  for (int i = 0; i < iters; i++)
>> > +     {
>> > +       ary[i] = NULL;
>> > +       sizes[i] = size + rand_r (&r_seed) % (7 * size);
>> > +       idx[i] = rand_r (&r_seed) % (iters / 64);
>>
>> Magic numbers like these could be helpfully documented.
>>
>> It looks like you are using uniformly distributed random numbers for
>> allocation sizes. This doesn't necessarily bear any relation to what
>> actual allocation sizes are used in a real application (in fact it has
>> been show that uniformly and normally distributed random numbers are a
>> bad fit). I understand this is a simple test but I think another
>> distribution would be more suitable.
>>
> As I wrote above this should be used only for microbenchmarks where you
> optimize code path but not change algoritm. Here I am more interested on
> performance in given code paths and how they are affected.

I think there is value in at least two benchmarks:

1. Random benchmark that vaguely represents application behaviour. The
purpose of this test is to quickly give you a basic level of
confidence that you have not regressed allocator performance when
making any change to the allocator.
2. Simple benchmark that tests fast path performance. The purpose of
this test is to show how the allocator performs across a range of
allocation sizes (e.g. "ah, I can see that the allocator slows down at
2kB allocations I wonder why?"). This lets developers understand the
implications of changes (e.g. "I sped up application A and B by
improving small allocation fast path performance at the expense of
large allocation performance and here are the numbers").

In my opinion this benchmark is somewhere between the two, it contains
randomness which in my opinion makes it more difficult to understand
but not sufficiently application-like.

> When you change algorithm then using generated trace is not adeqate as
> you do not capture features of real usage patterns and even then you
> need to measure external factors like effect of cache locality.
>
>> Also I would recommend tetsing some larger allocations as some
>> allocators are very fast for small allocations but perform very poorly
>> for larger allocations. Switching to e.g. a power law distribution of
>> allocation sizes would allow this to be done without using huge
>> amounts of memory or runtime.
>>
> Preprocessing time does not matter here, it is not measured.

I'm not sure I understand.

-- 
Will Newton
Toolchain Working Group, Linaro

Follow-Ups:
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka

References:
- [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka
- Re: [PATCH] Simple malloc benchtest.
  - From: Will Newton
- Re: [PATCH] Simple malloc benchtest.
  - From: OndÅej BÃlka

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]