[PATCH v3 2/5] benchtests: Add memset zero fill benchtest

Mon Oct 18 12:57:12 GMT 2021

> > > What do you think?
> >
> > As I've mentioned, this will never work using the current benchmark loop.
> > At size 256 your loop has only 1 timer tick... The only way to get any data
> > out is to increase the time taken per call. At 16K there are about 20 ticks so
> > it is still very inaccurate. By repeating the test thousands of times you can
> > some signal out (eg. 20% is 20 ticks, 80% is 21 gives ~20.8 ticks on average),
> > but that's impossible for smaller sizes.
> >
> > So if you want to measure small sizes, you need to use a more accurate timing
> > loop.
> 
> Thank you for the comment.
> OK, I understood. So I updated the start size to 16KB too to commit first.
> Please find V5 [1] and merge it if it's OK.
> Changes from V4:
> - Start size to 16KB from 256B
> - End size to 16MB from 64MB

> [1] https://sourceware.org/pipermail/libc-alpha/2021-September/131245.html

Hi Tamura,

I agree with you that is important to measure calls with smaller
lengths.  IMHO the issue here is not if the benchmark should measure or
not this lengths, but how it could measure that.

+static void
+__attribute__((noinline, noclone))
+do_one_test (json_ctx_t *json_ctx, impl_t *impl, CHAR *s,
+	     int c1 __attribute ((unused)), int c2 __attribute ((unused)),
+	     size_t n)
+{
+  size_t i, iters = 32;
+  timing_t start, stop, cur, latency = 0;
+
+  CALL (impl, s, c2, n); // warm up
+
+  for (i = 0; i < iters; i++)
+    {
+      memset (s, c1, n); // alternation
+
+      TIMING_NOW (start);
+
+      CALL (impl, s, c2, n);
+
+      TIMING_NOW (stop);
+      TIMING_DIFF (cur, start, stop);
+      TIMING_ACCUM (latency, cur);
+    }
+
+  json_element_double (json_ctx, (double) latency / (double) iters);
+}

By doing this you are measuring just the call it self and accumulating
the results. This is indeed not measurable for really small lengths.
You could try moving the memset and the timing out of the loop and
measure the time spent in multiple runs. To fix the memset you could
memset a bigger buffer and move the s pointer on each loop. I guess this
will reduce the variations Wilco mentioned.
Maybe we need to keep this loop for bigger lengths as we will need
a buffer too much big for the implementation that I suggested.

Another point here is that GNU Code Style asks for /**/ comments
instead of //. As seen in
http://www.gnu.org/prep/standards/standards.html#Comments

Finally, Sorry that I took so long to reply here.
Thanks for working on this.
---
Lucas A. M. Magalhães