[PATCH v2 2/2] x86: Add seperate non-temporal tunable for memset

Borislav Petkov bp@alien8.de
Fri Jun 14 18:01:05 GMT 2024


On Fri, Jun 14, 2024 at 11:39:07AM -0500, Noah Goldstein wrote:
> On Fri, Jun 14, 2024 at 5:41 AM Borislav Petkov <bp@alien8.de> wrote:
> >
> > Hi,
> >
> > I'm not subscribed to the glibc list - pls CC me directly on replies.
> >
> > On Wed, May 29, 2024 at 03:53:20PM -0700, H.J. Lu wrote:
> > > On Fri, May 24, 2024 at 10:39?AM Noah Goldstein <goldstein.w.n@gmail.com> wrote:
> > > >
> > > > The tuning for non-temporal stores for memset vs memcpy is not always
> > > > the same. This includes both the exact value and whether non-temporal
> > > > stores are profitable at all for a given arch.
> > > >
> > > > This patch add `x86_memset_non_temporal_threshold`. Currently we
> > > > disable non-temporal stores for non Intel vendors as the only
> > > > benchmarks showing its benefit have been on Intel hardware.
> > > > ---
> > > >  manual/tunables.texi                             | 16 +++++++++++++++-
> > > >  sysdeps/x86/cacheinfo.h                          |  8 +++++++-
> > > >  sysdeps/x86/dl-cacheinfo.h                       | 16 ++++++++++++++++
> > > >  sysdeps/x86/dl-diagnostics-cpu.c                 |  2 ++
> > > >  sysdeps/x86/dl-tunables.list                     |  3 +++
> > > >  sysdeps/x86/include/cpu-features.h               |  4 +++-
> > > >  .../x86_64/multiarch/memset-vec-unaligned-erms.S |  6 +++---
> > > >  7 files changed, 49 insertions(+), 6 deletions(-)
> >
> > ...
> >
> > > > +  /* Non-temporal stores in memset have only been tested on Intel hardware.
> > > > +     Until we benchmark data on other x86 processor, disable non-temporal
> > > > +     stores in memset. */
> >
> > Well, something's fishy here:
> >
> > $ ./elf/ld.so --list-tunables | grep threshold
> > glibc.malloc.mmap_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> > glibc.cpu.x86_rep_movsb_threshold: 0x600000 (min: 0x100, max: 0xffffffffffffffff)
> > glibc.cpu.x86_non_temporal_threshold: 0x600000 (min: 0x4040, max: 0xfffffffffffffff)
> > glibc.malloc.trim_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> > glibc.cpu.x86_rep_stosb_threshold: 0xffffffffffffffff (min: 0x1, max: 0xffffffffffffffff)
> > glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)
> >                                             ^^^^^^^^^
> >
> > on glibc-2.39.9000-300-g54c1efdac55b from git.
> >
> > That's on a AMD Zen1 so I'd expect that memset NT threshold to be
> > 0xffffffffffffffff by default...
> >
> > Thx.
> >
> 
> Thanks for bringing this up, looking into it.

Thx, so Michael did debug it yesterday to the ranges mismatching:

diff --git a/elf/dl-tunables.c b/elf/dl-tunables.c
index 147cc4cf23f5..ecf3c1d3736e 100644
--- a/elf/dl-tunables.c
+++ b/elf/dl-tunables.c
@@ -110,8 +110,11 @@ do_tunable_update_val (tunable_t *cur, const tunable_val_t *valp,
 
   /* Bail out if the bounds are not valid.  */
   if (tunable_val_lt (val, min, unsigned_cmp)
-      || tunable_val_lt (max, val, unsigned_cmp))
+      || tunable_val_lt (max, val, unsigned_cmp)) {
+         _dl_printf("bail out due to: 0x%lx, min: 0x%lx, max: 0x%lx\n",
+                    val, min, max);
     return;
+  }
 
   cur->val.numval = val;
   cur->type.min = min;

$ ./elf/ld.so --list-tunables | grep -E "(threshold|bail)"
dl_init_cacheinfo: memset_non_temporal_threshold: 0xffffffffffffffff
dl_init_cacheinfo: memset_non_temporal_threshold, tunable_size: 0xffffffffffffffff
bail out due to: 0xffffffffffffffff, min: 0x4040, max: 0xfffffffffffffff
^^^^^^^

dl_init_cacheinfo: memset_non_temporal_threshold, tunable set: 0xffffffffffffffff, min: 0x4040, max: 0xfffffffffffffff
glibc.cpu.x86_memset_non_temporal_threshold: 0x0 (min: 0x0, max: 0xffffffffffffffff)

but you guys probably should do the right fix here.

Thx.

-- 
Regards/Gruss,
    Boris.

https://people.kernel.org/tglx/notes-about-netiquette


More information about the Libc-alpha mailing list