This is the mail archive of the
mailing list for the glibc project.
Re: Re: [PATCH] Add math benchmark latency test
- From: Arjan van de Ven <arjan at linux dot intel dot com>
- To: Siddhesh Poyarekar <siddhesh at gotplt dot org>, Wilco Dijkstra <Wilco dot Dijkstra at arm dot com>, "libc-alpha at sourceware dot org" <libc-alpha at sourceware dot org>
- Cc: nd <nd at arm dot com>
- Date: Wed, 16 Aug 2017 07:23:08 -0700
- Subject: Re: Re: [PATCH] Add math benchmark latency test
- Authentication-results: sourceware.org; auth=none
- References: <firstname.lastname@example.org>
On 8/16/2017 6:07 AM, Siddhesh Poyarekar wrote:
I didn't notice this earlier, but shouldn't throughput be
iterations/cycle and not the other way around? That is, throughput
should be the inverse of latency.
well not really...
I've been working on making expf() faster for x86 (see HJ's email earlier), and
with a massive out of order/pipelined cpu, latency and throughput are very distinct things.
expf() can run at a throughput of somewhere in the 10 to 11 cycles range, while the latency
can be in the 45 to 55 cycles range.
(not trying to do benchmarking here, just wanting to show an order of magnitude)
the latency is then the number of cycles it takes to get a result (on an empty cpu)
through from end to end, e.g.
while throughput is the cost if you put multiple consecutive through the cpu,
printf("%e", expf(f1) + expf(f2) + expf(f3) + expf(4))
(using "printf" as a proxy for 'make externally visible' sync point; of course in reality it could be many other things)
the out of order cpu will start execution of the second third and fourth expf() in parallel to the first, which will
hide the latency (so the result time is not 4x45 + time of 4 adds, but much less, closer to 45 + 3x11 + time of 4 adds)
I picked 4 expf()s here but theoretically throughput would be measured with the asymptote of 4...