Differences between revisions 8 and 9
Revision 8 as of 2013-05-09 13:49:30
Size: 6860
Revision 9 as of 2013-05-09 13:53:39
Size: 6913
Deletions are marked like this. Additions are marked like this.
Line 43: Line 43:
 * NSCD group order behaviour e.g. default gid listed first like other UNIX? Fastest order? Sorted order?  * NSCD group order behaviour e.g. default gid listed first like other UNIX? Fastest order? Sorted order? (https://bugzilla.redhat.com/show_bug.cgi?id=959980)

Tuning Library Runtime Behavior


The following material is a work in progress and should not be considered complete or ready for public use.

1. Why?

No set of library defaults is appropriate for all workloads.

The GNU C Library makes assumptions on behalf of the user and provides a specific runtime behaviour that may not match the user workload or requirements.

For example the NPTL implementation sets a fixed cache size of 40MB for the re-use of thread stacks. Is it possible that this is correct under all workloads? Average workloads? This default was set 10 years ago and has not been revisited.

I propose we expose some of the library internals as tunable runtime parameters that our users and developers can use to tune the library. Developers would use them to achieve optimal mean performance for all users, while a single advanced user might use it to get the best performance from their application.

To reiterate:

  • Advanced users can do their own performance measurements and work with the community to discuss what works and doesn't work on certain workloads or hardware configurations.
  • Developers can use the knobs to test ideas, or experiment with dynamic tuning and ensure that average case performance of the default parameters works for a broad audience.
  • Normal users accept the defaults and those defaults work well.

We have immediate short-term needs today to expose library internals as tunable parameters, in particular:

  • When and if to use PI-aware locks for the library internals.
  • Default thread stack sizes.
  • Lock elision parameters for performance testing.
  • Size of thread stack cache.
  • XDR max request size. Limited to 1024 bytes for legacy servers, but Linux imposes no such limit. You could have a huge group map and it should work. Unfortunately large XDR requests can consume large amounts of memory on the server, so it's up to the admin to select a reasonable value. The library can enforce a maximum, but eventually that will be not enough for certain uses.
  • Memory allocator, malloc() et. al., beahviour.
  • Dynamic loader behaviour.
  • NSCD group order behaviour e.g. default gid listed first like other UNIX? Fastest order? Sorted order? (https://bugzilla.redhat.com/show_bug.cgi?id=959980)

2. How?

  • Tunables never change semantics.
    • Changing a tunable must never cause the semantics of any library interface to violate the standard the library implements. The tunable adjusts internal implementation details all within the guiding envelope of the standard that defines the function. The tunable might lessen the promise of a function but only if that lessening is still within the bounds of the standard.

  • Declare the tunables stable only in a given release e.g. 2.17.
    • The tunables expose internal implementation details of the library and should not be considered a stable ABI. The library must be able to evolve internal implementation from release to release.
  • Define tunable settings in terms of a "context."
    • Each change to a tunable matters only in the context of the tunables use. For example the global context would set a tunable for any use of that tunable globally for the process. For example a function-level context might set a default for all functions called from the current function e.g. lock elision.
  • Allow the use of environment variables to set tunables.
    • Easy for programmer experimentation.
  • Create a stable API for manipulating tunable runtime parameters.
    • Easy for automation.
  • Provide a shared-memory API for tuning.
    • Allows for performance experiments and the developing of auto-tuning algorithms on live running programs.

3. Examples

This is only a toy example of how one might use a global pointer, and a lockless algorithm, to push and pop tunable contexts for the entire library to use. The entire library would need to reference tunables via some levels of indirection through the global pointer (previously just referenced the global pointer).

For example:

/* A definition of a tunable is a name/value tuple (for now).  */
struct __tunable {
  char *tunable;
  char *value;
typedef struct __tunable tunable;

/* The tunables have IDs that we use to index into the tunable table
   for each context.  */
enum {

/* A context contains a set of tunables.  */
struct __tunable_context {
  char *id;
  tunable tlist[GNU_LIBC_MAX_TUNABLE];
  tunable_context *previous; 
typedef struct __tunable_context tunable_context;

/* Hidden pointer to active context in the library.  */
tunable_context *__default_tunable_context attribute_hidden;

/* Create a context from the current active context and call it ID.  */
tunable_context *create_tunable_context_np (const char *id);
int destroy_tunable_context_np (tunable_context *context);

/* Set a tunable for a context.  */
int set_tunable_np (tunable_context *context, const char *tunable, const char *value);
const char *get_tunable_np (tunable_context *context, const char *tunable);

/* Push or pop a context. Overrides the previous context.  */
int push_tunable_context_np (tunable_context *context);
tunable_context *pop_tunable_context_np (void);

/* Get the list of all tunables currently available.  */
int list_tunables_np (char **tunables, int *size);


tunable_context *ctx = create_tunable_context_np ();
if (set_tunable_np (ctx, "GNU_LIBC_PTHREAD_DEFAULT_STACKSIZE", "1048576") != 0)
    /* Error handling.  */
if (push_tunable_context_np (ctx) != 0)
    /* Error handling.  */
/* Do work with context active.  */
if (pop_tunable_context_np () == NULL)
    /* Error handling.  */
/* Restores previous context.  */

Per-process as an env var:

export GNU_LIBC_$tunable=$value

Equivalent to calling the following at startup:

tunable_context *ctx = create_tunable_context_np (NULL);
set_tunable_np (ctx, "GNU_LIBC_$tunable", "$value");
push_tunable_context_np (ctx);

Per-named-context as a env-var:

export GNU_LIBC_$tunable_$id=$value

Equivalent to calling the following at startup:

tunable_context *ctx = create_tunable_context_np ("$id");
set_tunable_np (ctx, "GNU_LIBC_$tunable", "$value");
push_tunable_context_np (ctx);


  • `id' is a user chosen identifier for the context.
  • `tunable' is the serialized name for the tunable.
  • `value' is the serialized value of the tunable which will be interpreted by the tunable code as required.


  • A shared memory interface would allow you to attach to a program and manipulate the runtime settings in realtime.

None: TuningLibraryRuntimeBehavior (last edited 2017-09-11 12:25:54 by CarlosODonell)