This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)

From: Anton Youdkevitch <anton dot youdkevitch at bell-sw dot com>
To: Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Mon, 24 Jun 2019 09:52:53 +0200
Subject: Re: glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)
References: <VI1PR0801MB2127DC882459BC63DA318B3983E70@VI1PR0801MB2127.eurprd08.prod.outlook.com>

Wilco,

On 6/21/2019 2:01 PM, Wilco Dijkstra wrote:

Hi Anton,
Recently I was doing an optimized implementation of memcpy/memmove orTX2. While running internal microbenchmarks I noticed that for the"fast" benchmarks (~10ms runtime) the results vary quitesignificantly across runs (5%-20%). It is possible to find two runsthat show my implementation actually significantly worsened theperformance. Also there are (quite common) cases when the "baseline"implementation gets worse and the "tested" implementation gets better(or vice versa) across the runs.
Yes this is certainly possible for any short running benchmark, whichis why I recently increased the minimum iteration count 128 times. Iran it on a fixed frequency server and got quite stable results.However if your CPU does frequency scaling then 10ms is likely tooshort for consistent results.

I think that we can assume frequency throttling to be a general rulethese days.

The first solution to this that comes to mind is to increase theruntime for the "fast" benchmarks. If I increase bench-memcpy runtime32x (the actual runtime for TX2 would be ~2s) the results for aparticular implementation are always within 5% range. The effect ofone benchmark gains and another one loses for different runs whilenot as significant still remains. So, are there any reasons not tobumping up the runtime of the "fast" benchmarks to 1s-2s?
1 second per benchmark sounds reasonable, however if you just increaseINNER_LOOP_ITERS a lot then various benchmarks will become way tooslow. So you may need to move them to INNER_LOOP_ITERS_MEDIUM orsomething similar. If you use "time $(run-bench)" in the benchtestsmakefile it prints out the time for each benchmark.


OK, I understand this, thanks. I will use INNER_LOOP_ITERS_MEDIUM then.

--
  Thanks,
  Anton

References:
- RE: glibc benchmarks' results can be unreliable for short runtimes (on Aarch64)
  - From: Wilco Dijkstra

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]