[PATCH v2 2/2] x86: Add seperate non-temporal tunable for memset
Borislav Petkov
bp@alien8.de
Fri Jun 14 18:01:05 GMT 2024
On Fri, Jun 14, 2024 at 11:39:07AM -0500, Noah Goldstein wrote:
> On Fri, Jun 14, 2024 at 5:41 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > Hi,
> >
> > I'm not subscribed to the glibc list - pls CC me directly on replies.
> >
> > On Wed, May 29, 2024 at 03:53:20PM -0700, H.J. Lu wrote:
> > > On Fri, May 24, 2024 at 10:39?AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > The tuning for non-temporal stores for memset vs memcpy is not always
> > > > the same. This includes both the exact value and whether non-temporal
> > > > stores are profitable at all for a given arch.
> > > >
> > > > This patch add `x86_memset_non_temporal_threshold`. Currently we
> > > > disable non-temporal stores for non Intel vendors as the only
> > > > benchmarks showing its benefit have been on Intel hardware.
> > > > ---
> > > > manual/tunables.texi | 16 +++++++++++++++-
> > > > sysdeps/x86/cacheinfo.h | 8 +++++++-
> > > > sysdeps/x86/dl-cacheinfo.h | 16 ++++++++++++++++
> > > > sysdeps/x86/dl-diagnostics-cpu.c | 2 ++
> > > > sysdeps/x86/dl-tunables.list | 3 +++
> > > > sysdeps/x86/include/cpu-features.h | 4 +++-
> > > > .../x86_64/multiarch/memset-vec-unaligned-erms.S | 6 +++---
> > > > 7 files changed, 49 insertions(+), 6 deletions(-)
> >
> > ...
> >
> > > > + /* Non-temporal stores in memset have only been tested on Intel hardware.
> > > > + Until we benchmark data on other x86 processor, disable non-temporal
> > > > + stores in memset. */
> >
> > Well, something's fishy here:
> >
> > $ ./elf/ld.so --list-tunables | grep threshold
> > glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> > glibc.cpu.x86_rep_movsb_threshold: 0x600000 (min: 0x100, max: 0xffffffffffffffff)
> > glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x4040, max: 0xfffffffffffffff)
> > glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> > glibc.cpu.x86_rep_stosb_threshold: 0xffffffffffffffff (min: 0x1, max: 0xffffffffffffffff)
> > glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> > ^^^^^^^^^
> >
> > on glibc-2.39.9000-300-g54c1efdac55b from git.
> >
> > That's on a AMD Zen1 so I'd expect that memset NT threshold to be
> > 0xffffffffffffffff by default...
> >
> > Thx.
> >
>
> Thanks for bringing this up, looking into it.
Thx, so Michael did debug it yesterday to the ranges mismatching:
diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 147cc4cf23f5..ecf3c1d3736e 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -110,8 +110,11 @@ do_tunable_update_val (tunable_t *cur, const tunable_val_t *valp,
/* Bail out if the bounds are not valid. */
if (tunable_val_lt (val, min, unsigned_cmp)
- || tunable_val_lt (max, val, unsigned_cmp))
+ || tunable_val_lt (max, val, unsigned_cmp)) {
+ _dl_printf("bail out due to: 0x%lx, min: 0x%lx, max: 0x%lx\n",
+ val, min, max);
return;
+ }
cur->val.numval = val;
cur->type.min = min;
$ ./elf/ld.so --list-tunables | grep -E "(threshold|bail)"
dl_init_cacheinfo: memset_non_temporal_threshold: 0xffffffffffffffff
dl_init_cacheinfo: memset_non_temporal_threshold, tunable_size: 0xffffffffffffffff
bail out due to: 0xffffffffffffffff, min: 0x4040, max: 0xfffffffffffffff
^^^^^^^
dl_init_cacheinfo: memset_non_temporal_threshold, tunable set: 0xffffffffffffffff, min: 0x4040, max: 0xfffffffffffffff
glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
but you guys probably should do the right fix here.
Thx.
--
Regards/Gruss,
Boris.
https://people.kernel.org/tglx/notes-about-netiquette
More information about the Libc-alpha
mailing list