This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)
On 07/17/2018 01:32 AM, Tulio Magno Quites Machado Filho wrote:
I'm not following your line of thought here:
- glibc.cpu.hwcaps is specific to i386 and x86-64
- glibc.cpu is specific to aarch64
- glibc.cpu.cached_memopt is specific to powerpc, powerpc64 and powerpc64le
What am I missing?
The difference is that glibc.cpu.name and glibc.cpu.hwcaps are
conceptually generic tunables, i.e. there is a reasonable chance that
couple of releases down the line another architecture may want to
provide tuning facility for CPUs by name or by HWCAPS. The
cached_memopt one is not very clear to me and seems more like something
that is only useful on power8. x86-specific tunables i,e, where the
concept is not currently applicable for other architectures
(x86_l2_temporal_threshold) are prefixed with x86_*.
Notice the optimization is not specific to a CPU, but specific to an user
scenario (cacheable memory). In other words, the optimization can't be used
whenever PPC_FEATURE2_ARCH_2_07 because it could downgrade the performance when
cache-inhibited memory is being used.
Ahh OK, I got thrown off by the fact that there's a separate routine for
it and assumed that it is Power8-specific. I have a different concern
then; a tunable is process-wide so the cached_memopt tunable essentially
assumes that the entire process is using cache-inhibited memory. Is
that a reasonable assumption? In my experience a typical process would
have only a set of structures in cache-inhibited memory and most of it
would be regular memory. In that sense it looks more like a tradeoff
hack and it would be nice to consider alternatives. Here are a couple I
can think of off the top of my head:
1. A new relocation that overlays on top of ifuncs and allows selection
of routines based on specific properties. I have had this idea for a
while but no time to implement it and it has much more general scope
than memory type; for example memory alignment could also be a factor to
short-cut parts of string routines at compile time itself. It does not
have the runtime flexibility of a tunable but is probably far more
configurable.
2. If there is a correlation to size then implement something similar to
the x86 temporal_threshold tunable. This is probably just as good or
bad as setting a cached_memopt flag but has the effect of generalizing
what was a tunable.
What do you think?
Siddhesh