If we cannot get cache sizes from CPUID, we should not use string functions that depend on cache sizes, in particular the non-temporal store threshold. After commit 48b74865c63840b288bd85b4d8743533b73b339b("x86: Check minimum/maximum of non_temporal_threshold [BZ #29953]") we clamp the the minimum non-temporal store threshold at 0x4040, but this is at the extremely low end likely bad for performance.
(In reply to Florian Weimer from comment #0) > If we cannot get cache sizes from CPUID, we should not use string functions > that depend on cache sizes, in particular the non-temporal store threshold. > > After commit 48b74865c63840b288bd85b4d8743533b73b339b("x86: Check > minimum/maximum of non_temporal_threshold [BZ #29953]") we clamp the the > minimum non-temporal store threshold at 0x4040, but this is at the extremely > low end likely bad for performance. We probably would be better off using the maximum and skipping non-temporal threshold all together.
(In reply to Noah Goldstein from comment #1) > (In reply to Florian Weimer from comment #0) > > If we cannot get cache sizes from CPUID, we should not use string functions > > that depend on cache sizes, in particular the non-temporal store threshold. > > > > After commit 48b74865c63840b288bd85b4d8743533b73b339b("x86: Check > > minimum/maximum of non_temporal_threshold [BZ #29953]") we clamp the the > > minimum non-temporal store threshold at 0x4040, but this is at the extremely > > low end likely bad for performance. > > We probably would be better off using the maximum and skipping non-temporal > threshold all together. It sounds reasonable.
(In reply to Noah Goldstein from comment #1) > (In reply to Florian Weimer from comment #0) > > If we cannot get cache sizes from CPUID, we should not use string functions > > that depend on cache sizes, in particular the non-temporal store threshold. > > > > After commit 48b74865c63840b288bd85b4d8743533b73b339b("x86: Check > > minimum/maximum of non_temporal_threshold [BZ #29953]") we clamp the the > > minimum non-temporal store threshold at 0x4040, but this is at the extremely > > low end likely bad for performance. > > We probably would be better off using the maximum and skipping non-temporal > threshold all together. True, that's probably better than switching implementations. Please feel to retitle this bug as appropriate.
Posted patch to make default in this case 64MB https://patchwork.sourceware.org/project/glibc/list/?series=19713
Fix pushed.
Are you going to backport this? Fixed by: commit ed2f9dc9420c4c61436328778a70459d0a35556a Author: Noah Goldstein <goldstein.w.n@gmail.com> Date: Mon May 8 22:10:20 2023 -0500 x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429] If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, it almost certainly means we failed to read the systems cache info. In this case, rather than defaulting the minimum correct value, we should default to a value that gets at least reasonable performance. 64MB is chosen conservatively to be at the very high end. This should never cause non-temporal stores when, if we had read cache info, we wouldn't have otherwise. Reviewed-by: Florian Weimer <fweimer@redhat.com>
(In reply to Florian Weimer from comment #6) > Are you going to backport this? > Didn't see this question earlier. Sure I can backport. You think to 2.28? > Fixed by: > > commit ed2f9dc9420c4c61436328778a70459d0a35556a > Author: Noah Goldstein <goldstein.w.n@gmail.com> > Date: Mon May 8 22:10:20 2023 -0500 > > x86: Use 64MB as nt-store threshold if no cacheinfo [BZ #30429] > > If `non_temporal_threshold` is below `minimum_non_temporal_threshold`, > it almost certainly means we failed to read the systems cache info. > > In this case, rather than defaulting the minimum correct value, we > should default to a value that gets at least reasonable > performance. 64MB is chosen conservatively to be at the very high > end. This should never cause non-temporal stores when, if we had read > cache info, we wouldn't have otherwise. > Reviewed-by: Florian Weimer <fweimer@redhat.com>