This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]

Re: Re: [PATCH] Add math benchmark latency test

From: Arjan van de Ven <arjan at linux dot intel dot com>
To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
Cc: nd <nd at arm dot com>
Date: Wed, 16 Aug 2017 07:23:08 -0700
Subject: Re: Re: [PATCH] Add math benchmark latency test
Authentication-results: sourceware.org; auth=none
References: <0e008f2e-f41a-1bb8-803c-2f798e2c3541@gotplt.org>

On 8/16/2017 6:07 AM, Siddhesh Poyarekar wrote:

I didn't notice this earlier, but shouldn't throughput be
iterations/cycle and not the other way around?  That is, throughput
should be the inverse of latency.



well not really...

I've been working on making expf() faster for x86 (see HJ's email earlier), and
with a massive out of order/pipelined cpu, latency and throughput are very distinct things.
expf() can run at a throughput of somewhere in the 10 to 11 cycles range, while the latency
can be in the 45 to 55 cycles range.
(not trying to do benchmarking here, just wanting to show an order of magnitude)

the latency is then the number of cycles it takes to get a result (on an empty cpu)
through from end to end, e.g.

printf("%e", expf(fl))

while throughput is the cost if  you put multiple consecutive through the cpu,
like

printf("%e", expf(f1) + expf(f2) + expf(f3) + expf(4))

(using "printf" as a proxy for 'make externally visible' sync point; of course in reality it could be many other things)


the out of order cpu will start execution of the second third and fourth expf() in parallel to the first, which will
hide the latency (so the result time is not 4x45 + time of 4 adds, but much less, closer to 45 + 3x11 + time of 4 adds)

I picked 4 expf()s here but theoretically throughput would be measured with the asymptote of 4...

Follow-Ups:
- Re: [PATCH] Add math benchmark latency test
  - From: Siddhesh Poyarekar

References:
- Re: [PATCH] Add math benchmark latency test
  - From: Siddhesh Poyarekar

Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]