Tuning Library Runtime Behavior
Contents
Tunables were implemented in glibc 2.25 (2017-02-01).
1. Why?
No set of library defaults is appropriate for all workloads.
The GNU C Library makes assumptions on behalf of the user and provides a specific runtime behaviour that may not match the user workload or requirements.
For example the NPTL implementation sets a fixed cache size of 40MB for the re-use of thread stacks. Is it possible that this is correct under all workloads? Average workloads? This default was set 10 years ago and has not been revisited.
I propose we expose some of the library internals as tunable runtime parameters that our users and developers can use to tune the library. Developers would use them to achieve optimal mean performance for all users, while a single advanced user might use it to get the best performance from their application.
To reiterate:
- Advanced users can do their own performance measurements and work with the community to discuss what works and doesn't work on certain workloads or hardware configurations.
- Developers can use the knobs to test ideas, or experiment with dynamic tuning and ensure that average case performance of the default parameters works for a broad audience.
- Normal users accept the defaults and those defaults work well.
We have immediate short-term needs today to expose library internals as tunable parameters, in particular:
- When and if to use PI-aware locks for the library internals.
- Default thread stack sizes.
- Spinning in mutexes and speed of back-off.
- Lock elision parameters for performance testing.
Implemented by 07ed18d26a342741cb25a4739158c65ed9dd4d09 and provides glibc.elision.*
- Size of thread stack cache, both maximums, minimums, and defaults.
- XDR max request size. Limited to 1024 bytes for legacy servers, but Linux imposes no such limit. You could have a huge group map and it should work. Unfortunately large XDR requests can consume large amounts of memory on the server, so it's up to the admin to select a reasonable value. The library can enforce a maximum, but eventually that will be not enough for certain uses.
- Memory allocator, malloc() et. al., beahviour.
- Dynamic loader behaviour.
NSCD group order behaviour e.g. default gid listed first like other UNIX? Fastest order? Sorted order? (https://bugzilla.redhat.com/show_bug.cgi?id=959980)
User selectable amount of static TLS to reserve (e.g. rtld/dl-tls.c TLS_STATIC_SURPLUS) for dlopen'd modules that could then use this static TLS for optimal access (http://sourceware.org/ml/libc-alpha/2013-05/msg01088.html)
Implemented by commit 0c7b002fac12dcb2f53ba83ee56bb3b5d2439447 and provides glibc.rtld.optional_static_tls
User selectable buffering schemes for stdio (http://sourceware.org/bugzilla/show_bug.cgi?id=4099).
- Initial size of group list for initgroups.
- Disable RFC 3484 IPv4 address sorting for legacy applications.
- Size of buffer reads in stream implementation. When using NFS and very very large block sizes, say 1MB, the glibc stream implementation will buffer using those block sizes and this leads to huge latencies in buffer fills. It would be better to be able to tune this manually per stream. Perhaps the best option is to have a "max buffer size" tunnable, that is queried when creating the stream and used as the upper limit regardless of the filesystem block size.
Value of sysconf (_SC_GETPW_R_SIZE_MAX), to work around buggy applications which treat the value as a hard limit.
- Custom paths for /etc/resolv.conf, /etc/nsswitch.conf, for testing purposes.
- Netlink retry behavior, such as initial timeout and speed of backoff.
2. How?
- Tunables are a tradeoff.
- If it is clear which choice is best, adding a tunable is a mistake.
- Tunables never make the implementation non-conforming
- Variables or other tunables should merely transform the library from one conforming implementation to a different conforming implementation. No settings should make it non-conforming.
- Tunables whose non-default values could break an application expecting the default values should be ignored for AT_SECURE.
- Any settings which could cause a conforming application which works correctly with the default settings to stop working correctly should be ignored completely when the program is suid or AT_SECURE is set in the aux vector.
- Tunable namespace should be clearly defined
- The namespace for glibc tuning variables should be clearly defined in such a way that they can be mechanically removed from the environment without having to worry that future additions will be missed by the stripping code.
- Tunables never change semantics.
Changing a tunable must never cause the semantics of any library interface to violate the standard the library implements. The tunable adjusts internal implementation details all within the guiding envelope of the standard that defines the function. The tunable might lessen the promise of a function but only if that lessening is still within the bounds of the standard.
- Tunables are thread safe.
- Setting the tunables shall be thread safe. All access must use at least the relaxed memory model (both in-process, and by external tools to change tunables).
- Declare the tunables stable only in a given release e.g. 2.17.
- The tunables expose internal implementation details of the library and should not be considered a stable ABI. The library must be able to evolve internal implementation from release to release.
- Allow the use of environment variables to set tunables.
- Easy for programmer experimentation. Shall be thread safe. Read only once at process startup. Changing any of the env vars that control runtime tuning will have no effect on the currently executing process. An application with AT_SECURE set will ignore all environment variable tunables and will not pass them automatically to their children (that doesn't preclude the AT_SECURE application setting an env var for the child or using the API to tune performance for itself).
- Encode glibc version numbers in the tunable name in some way.
- Tunables are specific to certain glibc versions. Using the version number to partition the namespace therefore seems prudent. This prepares for a potential future where glibc is supported as a software collection. It is also helpful with containers, where you might inspect processes which use a different glibc version.
- Allow the use of a system configuration file to set tunables and enforce adminstrative policy
Easy for Administrators to set global policy about tunables in a system configuration file that overrides any settings used by a user. The file could be located in /etc/sysconfig/glibc/tunables.conf. (The path needs tweaking because /etc/sysconfig is specific to Fedora and downstreams, and it should include a version number, as explained above.
- Self-describing format
Tunables should be self-describing, probably using DWARF which is not stripped from the glibc DSOs. This means that it is possible to access them even if the tunables and their types (uint64_t vs a string pointer, for instance) are not known to the tool which does the access.
- Changing string tunables at run time
- This is difficult because even if the pointer to the string is updated atomically, it is generally impossible to know when it is safe to deallocate the former backing string. Hardware transactional memory may allow in-place modification of strings if the existing memory region is large enough. The only option may be to accept a memory leak if a string tunable is changed. Therefore string tunables (or variable-length tunables in general) are at best avoided.
- Debugging
- Provide a way to dump all of the tunables for debugging. Provide a way to easily inspect all the tunable values from a debugger, or reset all tunables directly from the debugger e.g. inferior function call. Tunables must be self-describing, so that it is possible to dump them even if the process uses a different glibc version (perhaps because it is running in a container).
Implemented with ld.so --list-tunables via commit 86f65dffc2396d408beb628f1cad2b8f63e197bd
- Provide a way to dump all of the tunables for debugging. Provide a way to easily inspect all the tunable values from a debugger, or reset all tunables directly from the debugger e.g. inferior function call. Tunables must be self-describing, so that it is possible to dump them even if the process uses a different glibc version (perhaps because it is running in a container).
3. Design examples
3.1. Example: Some properties read at startup others continually via a global pointer
The only feasible design today is to create a global pointer that points to a structure that contains all tunnables for the entire library. At startup certain values of this structure are used for IFUNC selection and to initialize library-wide values that need early initialization. Later some values which can be dynamically changed may also be read via this global pointer e.g. default thread stack size. We document each property and if it's applied at startup, or if it is read at ever use. Startup properties could only be set via env vars or an admin sysconfig file read at startup.
3.2. Rejected Design Ideas
The following list captures some design ideas which we discussed, but rejected.
In-process API for process self-tuning. This is too dangerous to offer directly in glibc because if the tunables API is more convenient than the official API (e.g. for stdio buffer sizes), then no matter what we say about tunables stability, there will be applications which prevent glibc updates due to tunables dependencies. We can encourage development of a separate library for self-tuning, though, which can collect backwards compatibility kludges as required. This means that limiting the scope of tunables (to specific functions, threads, or some other context) may be difficult to implement.
Shared memory segment for tunables. It is difficult to get the permissions right, and it is useful to have that capability even for AT_SECURE processes. The lack of a shared memory segment should not be a significant restricition; due to the checkpoint/restore work, the kernel should have sufficient capabilities for process inspection.
4. Next steps
4.1. Collect all globals
As recommended in Cauldron 2013 we need to bring together a global internal private structure first that contains all of the globals one might want to modify. That way we can see what is actually tunnable.
4.2. Analyze env vars currently in use
Analyzing currenct use of glibc env vars. Currently not complete. Currently contains env vars from auxiliary libraries.
- ARGP_HELP_FMT
- LANG
- LD_BIND_NOW
- LD_LIBRARY_PATH
- LD_PRELOAD
- LD_TRACE_LOADED_OBJECTS
- LD_AOUT_LIBRARY_PATH
- LD_AOUT_PRELOAD
- LD_AUDIT
- LD_BIND_NOT
- LD_DEBUG
- LD_DEBUG_OUTPUT
- LD_DYNAMIC_WEAK
- LD_HWCAP_MASK
- LD_KEEPDIR
- LD_NOWARN
- LD_ORIGIN_PATH
- LD_POINTER_GUARD
- LD_PROFILE
- LD_PROFILE_OUTPUT
- LD_SHOW_AUXV
- LD_USE_LOAD_BIAS
- LD_VERBOSE
- LD_WARN
- LDD_ARGV0
- MALLOC_CHECK_
- NLSPATH
- HZ
- SEGFAULT_SIGNALS
- SEGFAULT_USE_ALTSTACK
- SEGFAULT_OUTPUT_NAME
- PCPROFILE_OUTPUT
- SOTRUSS_FROMLIST
- SOTRUSS_TOLIST
- SOTRUSS_EXIT
- SOTRUSS_WHICH
- SOTRUSS_OUTNAME
- GMON_OUT_PREFIX
- HESIOD_CONFIG s
- HES_DOMAIN s
- CRASHSERVER
- COREFILE
- GCONV_PATH
- HOME s
- LANGAUGE
- OUTPUT_CHARSET
- CHARSET
- LOCPATH
- LC_ALL
- I18NPATH
- POSIXLY_CORRECT
- MEMUSAGE_PROG_NAME
- MEMUSAGE_OUTPUT
- MEMUSAGE_BUFFER_SIZE
- MEMUSAGE_BUFFER_SIZE
- MEMUSAGE_NO_TIMER
- MEMUSAGE_TRACE_MMAP
- NIS_PATH
- NIS_DEFAULTS
- NIS_GROUP
- LOCALDOMAIN
- IFS
- TMPDIR
- GETCONF_DIR
- ENV_HOSTCONF
- ENV_SPOOF
- ENV_MULTI
- ENV_REORDER
- ENV_TRIM_ADD
- ENV_TRIM_OVERR
- HOSTALIASES
- RES_OPTIONS
- MSGVERB
- SEV_LEVEL
- NLSPROVIER
- LIBC_FATAL_STDERR_
- LD_ASSUME_KERNEL
- TZ
- TZDIR
- DATEMSK