This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

glibc.cpu.cached_memopt (was Re: [PATCH] Rename the glibc.tune namespace to glibc.cpu)


On 07/17/2018 01:32 AM, Tulio Magno Quites Machado Filho wrote:
I'm not following your line of thought here:

  - glibc.cpu.hwcaps is specific to i386 and x86-64
  - glibc.cpu is specific to aarch64
  - glibc.cpu.cached_memopt is specific to powerpc, powerpc64 and powerpc64le

What am I missing?

The difference is that glibc.cpu.name and glibc.cpu.hwcaps are conceptually generic tunables, i.e. there is a reasonable chance that couple of releases down the line another architecture may want to provide tuning facility for CPUs by name or by HWCAPS. The cached_memopt one is not very clear to me and seems more like something that is only useful on power8. x86-specific tunables i,e, where the concept is not currently applicable for other architectures (x86_l2_temporal_threshold) are prefixed with x86_*.

Notice the optimization is not specific to a CPU, but specific to an user
scenario (cacheable memory).  In other words, the optimization can't be used
whenever PPC_FEATURE2_ARCH_2_07 because it could downgrade the performance when
cache-inhibited memory is being used.

Ahh OK, I got thrown off by the fact that there's a separate routine for it and assumed that it is Power8-specific. I have a different concern then; a tunable is process-wide so the cached_memopt tunable essentially assumes that the entire process is using cache-inhibited memory. Is that a reasonable assumption? In my experience a typical process would have only a set of structures in cache-inhibited memory and most of it would be regular memory. In that sense it looks more like a tradeoff hack and it would be nice to consider alternatives. Here are a couple I can think of off the top of my head:

1. A new relocation that overlays on top of ifuncs and allows selection of routines based on specific properties. I have had this idea for a while but no time to implement it and it has much more general scope than memory type; for example memory alignment could also be a factor to short-cut parts of string routines at compile time itself. It does not have the runtime flexibility of a tunable but is probably far more configurable.

2. If there is a correlation to size then implement something similar to the x86 temporal_threshold tunable. This is probably just as good or bad as setting a cached_memopt flag but has the effect of generalizing what was a tunable.

What do you think?

Siddhesh


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]