Differences between revisions 4 and 5
Revision 4 as of 2013-03-01 21:21:16
Size: 4381
Comment:
Revision 5 as of 2013-03-28 19:14:00
Size: 4851
Comment:
Deletions are marked like this. Additions are marked like this.
Line 30: Line 30:

 * XDR max request size. Limited to 1024 bytes for legacy servers, but Linux imposes no such limit. You could have a huge group map and it should work. Unfortunately large XDR requests can consume large amounts of memory on the server, so it's up to the admin to select a reasonable value. The library can enforce a maximum, but eventually that will be not enough for certain uses.

 * Memory allocator, malloc() et. al., beahviour.

 * Dynamic loader behaviour.

Tuning Library Runtime Behavior

WORK IN PROGRESS

The following material is a work in progress and should not be considered complete or ready for public use.

1. Why?

The GNU C Library makes assumptions on behalf of the user and provides a specific runtime behaviour that may not match the user workload or requirements.

For example the NPTL implementation sets a fixed cache size of 40MB for the re-use of thread stacks. Is it possible that this is correct under all workloads? Average workloads? This default was set 10 years ago and has not been revisited.

I propose we expose some of the library internals as tunable runtime parameters that our users and developers can use to tune the library. Developers would use them to achieve optimal mean performance for all users, while a single advanced user might use it to get the best performance from their application.

To reiterate:

  • Advanced users can do their own performance measurements and work with the community to discuss what works and doesn't work on certain workloads or hardware configurations.
  • Developers can use the knobs to test ideas, or experiment with dynamic tuning and ensure that average case performance of the default parameters works for a broad audience.

We have immediate short-term needs today to expose library internals as tunable parameters, in particular:

  • Default thread stack sizes.
  • Lock elision parameters for performance testing.
  • Size of thread stack cache.
  • XDR max request size. Limited to 1024 bytes for legacy servers, but Linux imposes no such limit. You could have a huge group map and it should work. Unfortunately large XDR requests can consume large amounts of memory on the server, so it's up to the admin to select a reasonable value. The library can enforce a maximum, but eventually that will be not enough for certain uses.
  • Memory allocator, malloc() et. al., beahviour.
  • Dynamic loader behaviour.

2. How?

I propose we do the following:

  • Create a stable API for getting and setting and enumerating tunables.
  • Declare that the tunables themselves are stable only in a given release e.g. 2.17.
  • Provide a per-context interface with the ability to use environment variables to set tunables.

For example:

/* A definition of a tunable is a name/value tuple (for now).  */
struct __tunable {
  char *tunable;
  char *value;
};
typedef struct __tunable tunable;

/* The tunables have IDs that we use to index into the tunable table
   for each context.  */
enum {
  GNU_LIBC_PTHREAD_DEFAULT_STACKSIZE = 0,
  GNU_LIBC_PTHREAD_STACK_CACHESIZE = 1,
  ...
  GNU_LIBC_MAX_TUNABLE = 100
};

/* A context contains a set of tunables.  */
struct __tunable_context {
  char *id;
  tunable tlist[GNU_LIBC_MAX_TUNABLE];
  tunable_context *previous; 
};
typedef struct __tunable_context tunable_context;

/* Hidden pointer to active context in the library.  */
tunable_context *__default_tunable_context attribute_hidden;

/* Create a context and call it ID.  */
tunable_context *create_tunable_context_np (const char *id);
int destroy_tunable_context_np (tunable_context *context);

int set_tunable_np (tunable_context *context, const char *tunable, const char *value);
const char *get_tunable_np (tunable_context *context, const char *tunable);

int push_tunable_context_np (tunable_context *context);
tunable_context *pop_tunable_context_np (void);

e.g.

tunable_context *ctx = create_tunable_context_np ();
if (set_tunable_np (ctx, "GNU_LIBC_PTHREAD_DEFAULT_STACKSIZE", "1048576") != 0)
  {
    /* Error handling.  */
  }
if (push_tunable_context_np (ctx) != 0)
  {
    /* Error handling.  */
  }
/* Do work with context active.  */
if (pop_tunable_context_np () == NULL)
  {
    /* Error handling.  */
  }
/* Restores previous context.  */

Per-process as an env var:

export GNU_LIBC_$tunable=$value

Equivalent to calling the following at startup:

tunable_context *ctx = create_tunable_context_np (NULL);
set_tunable_np (ctx, "GNU_LIBC_$tunable", "$value");
push_tunable_context_np (ctx);

Per-named-context as a env-var:

export GNU_LIBC_$tunable_$id=$value

Equivalent to calling the following at startup:

tunable_context *ctx = create_tunable_context_np ("$id");
set_tunable_np (ctx, "GNU_LIBC_$tunable", "$value");
push_tunable_context_np (ctx);

Where:

  • `id' is a user chosen identifier for the context.
  • `tunable' is the serialized name for the tunable.
  • `value' is the serialized value of the tunable which will be interpreted by the tunable code as required.

Notes:

  • A shared memory interface would allow you to attach to a program and manipulate the runtime settings in realtime.

None: TuningLibraryRuntimeBehavior (last edited 2014-01-22 19:11:15 by CarlosODonell)