This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: Re: [PATCH] Add math benchmark latency test

On 8/16/2017 6:07 AM, Siddhesh Poyarekar wrote:
I didn't notice this earlier, but shouldn't throughput be
iterations/cycle and not the other way around?  That is, throughput
should be the inverse of latency.

well not really...

I've been working on making expf() faster for x86 (see HJ's email earlier), and
with a massive out of order/pipelined cpu, latency and throughput are very distinct things.
expf() can run at a throughput of somewhere in the 10 to 11 cycles range, while the latency
can be in the 45 to 55 cycles range.
(not trying to do benchmarking here, just wanting to show an order of magnitude)

the latency is then the number of cycles it takes to get a result (on an empty cpu)
through from end to end, e.g.

printf("%e", expf(fl))

while throughput is the cost if  you put multiple consecutive through the cpu,

printf("%e", expf(f1) + expf(f2) + expf(f3) + expf(4))

(using "printf" as a proxy for 'make externally visible' sync point; of course in reality it could be many other things)

the out of order cpu will start execution of the second third and fourth expf() in parallel to the first, which will
hide the latency (so the result time is not 4x45 + time of 4 adds, but much less, closer to 45 + 3x11 + time of 4 adds)

I picked 4 expf()s here but theoretically throughput would be measured with the asymptote of 4...

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]