This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] x86: Add --enable-rdtscp-in-benchtests
- From: Alexander Monakov <amonakov at ispras dot ru>
- To: "H.J. Lu" <hjl dot tools at gmail dot com>
- Cc: libc-alpha at sourceware dot org
- Date: Tue, 23 Oct 2018 11:44:09 +0300 (MSK)
- Subject: Re: [PATCH] x86: Add --enable-rdtscp-in-benchtests
- References: <20181022223711.26910-1-hjl.tools@gmail.com>
On Mon, 22 Oct 2018, H.J. Lu wrote:
> RDTSCP waits until all previous instructions have executed and all
> previous loads are globally visible before reading the counter. RDTSC
> doesn't wait until all previous instructions have been executed before
> reading the counter. This patch adds --enable-rdtscp-in-benchtests to
> use RDTSCP in benchtests.
>
> NOTE: Benchtests in RDTSCP-enabled glibc require CPUs capable of RDTSCP
> instruction. All x86 processors since 2010 support RDTSCP instruction.
Without implying an objection to the patch, I'd like to point out that the
Linux kernel always uses "lfence; rdtsc" on Intel CPUs to obtain ordered
timestamps with lowest possible overhead. LFENCE is available on all x86-64
processors as part of SSE2.
On AMD CPUs the kernel also uses "lfence; rdtsc", except if it cannot setup
a specific MSR to make LFENCE serializing; in that case it falls back to
"mfence; rdtsc".
"lfence; rdtsc" sequence is also recommended by Intel SDM documentation.
Is there a specific reason that "rdtscp" is preferable for this patch?
Thanks.
Alexander