This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Async signal safe TLS accesses
- From: Andrew Hunter <ahh at google dot com>
- To: libc-alpha at sourceware dot org, Ian Lance Taylor <iant at google dot com>, Paul Pluzhnikov <ppluzhnikov at google dot com>, carlos at redhat dot com
- Cc: Andrew Hunter <ahh at google dot com>
- Date: Mon, 23 Sep 2013 16:14:26 -0700
- Subject: Re: [PATCH] Async signal safe TLS accesses
- Authentication-results: sourceware.org; auth=none
- References: <523F2ED8 dot 8090909 at redhat dot com> <1379977289-21260-1-git-send-email-ahh at google dot com>
Some notes on this patch and the preceding discussion:
- I tested this patch on 2.15; I have rebased it to HEAD here but not
heavily tested it (since it seems we can't make any progress here
without seeing the code in question anyway.) It certainly builds. If
I screwed up something in the rebase, please don't throw rotten fruit
at me.
- I have made my best guesses as to a few questions of upstream
preferences (see in particular the comment in dl_mask_signals.) I am
happy to do pretty much anything roughly equivalent that you would
prefer.
As to the questions of the long-term allocator:
This patch defines signal_safe_malloc and friends as weak symbols in
ld.so. My expectation is that libc itself will override them with
smarter definitions. (If so, we can probably replace the current
ld.so implementation with the "normal" symbols in dl-minimal, as I
believe during ld.so bootstrap there's no opportunity to take signals.
I wasn't 100% sure of that so I went with the current approach for
now.)
I do not care what you end up going with for the design of the smarter
allocator in libc itself: the nice thing about this setup is that
anyone, say me, can override with smart signal_safe_malloc
definitions, and for our internal use this is precisely what I plan to
do. I am happy to go with whatever anyone else feels like as a libc
signal_safe_malloc. But if you were to ask me, this is what I would
do:
Define signal_safe_* in malloc/. Have them use the dlmalloc machinery
of arenas (though not call malloc() directly.) In the current
implemenation, I believe this can be made safe pretty easily (the
primary problem with current libc malloc is that it unconditionally
locks the TSD-cached arena; that would have to be a trylock.) The
disadvantage is, as Carlos points out, that puts you on the hook to
make sure dlmalloc _continues_ to be amenable to signal safety.
However, it doesn't stop anyone from changing __libc_malloc without
breaking signal_safe_malloc; in addition, there's no concern about
people who override libc malloc, as they will only replace the malloc
symbol, not signal_safe_malloc; since signal_safe_malloc will not call
into tcmalloc or jemalloc or whatever they specify, this will work
fine.
Thoughts? Comments on the patch?
On Mon, Sep 23, 2013 at 4:01 PM, Andrew Hunter <ahh@google.com> wrote:
> TLS accesses from initial-exec variables are async-signal-safe. Even
> dynamic-type accesses from shared objects loaded by ld.so at startup
> are. But dynamic accesses from dlopen()ed objects are not, which
> means a lot of trouble for any sort of per-thread state we want to
> use from signal handlers since we can't rely on always having
> initial-exec. Make all TLS access always signal safe.
>
> Doing this has a few components to it:
>
> * We introduce a set of symbols symbol_safe_{malloc,free,memalign,&c}.
> They do what it says on the box, but guarantee async-signal-safety.
> We provide a minimal mmap-based implementation in ld.so; anyone can
> override them more efficiently. (This may prove useful elsewhere.)
>
> * We use these throughout dl-tls.c in paths reachable from tls_get_addr
> (and, importantly, for allocations on other paths that might be
> freed/realloced from tls_get_addr.)
>
> * tls_get_addr synchronizes with dlopen() by dl_load_lock; this lock
> is reentrant, but not, alas, signal safe. Replace this with simple
> CAS based synchronization.
>
> * Use signal masking in the slow path of tls_get_addr to prevent
> reentrant TLS initialization. The most complicated part here is
> ensuring a dlopen which forces static TLS (and thus updates all
> threads' DTVs) does not interfere with this.
> ---
> elf/Versions | 5 ++
> elf/dl-misc.c | 184 +++++++++++++++++++++++++++++++++++++++++++++
> elf/dl-open.c | 5 +-
> elf/dl-reloc.c | 42 +++++++++--
> elf/dl-tls.c | 138 +++++++++++++++++++++-------------
> nptl/allocatestack.c | 16 ++--
> sysdeps/generic/ldsodefs.h | 17 +++++
> 7 files changed, 340 insertions(+), 67 deletions(-)
>
> diff --git a/elf/Versions b/elf/Versions
> index 2383992..f14e5d0 100644
> --- a/elf/Versions
> +++ b/elf/Versions
> @@ -53,6 +53,7 @@ ld {
> _dl_allocate_tls; _dl_allocate_tls_init;
> _dl_argv; _dl_find_dso_for_object; _dl_get_tls_static_info;
> _dl_deallocate_tls; _dl_make_stack_executable; _dl_out_of_memory;
> + _dl_clear_dtv;
> _dl_rtld_di_serinfo; _dl_starting_up; _dl_tls_setup;
> _rtld_global; _rtld_global_ro;
>
> @@ -61,5 +62,9 @@ ld {
>
> # Pointer protection.
> __pointer_chk_guard;
> +
> + # for signal safe TLS
> + signal_safe_malloc; signal_safe_free; signal_safe_memalign;
> + signal_safe_realloc; signal_safe_calloc;
> }
> }
> diff --git a/elf/dl-misc.c b/elf/dl-misc.c
> index 5fc13a4..312700c 100644
> --- a/elf/dl-misc.c
> +++ b/elf/dl-misc.c
> @@ -19,8 +19,10 @@
> #include <assert.h>
> #include <fcntl.h>
> #include <ldsodefs.h>
> +#include <libc-symbols.h>
> #include <limits.h>
> #include <link.h>
> +#include <signal.h>
> #include <stdarg.h>
> #include <stdlib.h>
> #include <string.h>
> @@ -364,3 +366,185 @@ _dl_higher_prime_number (unsigned long int n)
>
> return *low;
> }
> +
> +
> +/* mask every signal, returning the previous sigmask in <old>. */
> +void
> +internal_function
> +_dl_mask_all_signals (sigset_t *old)
> +{
> + sigset_t new;
> + sigfillset(&new);
> + int ret;
> +
> + /* So...hmmm. This function just serves as a replacement to pthread_sigmask,
> + which isn't available due to linking problems. Two concerns.
> + - pthread_sigmask removes a few internal signals from the
> + mask -- do we care?
> +
> + I don't think so. We only turn off signals for bounded time inside
> + dynamic linker functions. Pthread's concerned with a user permanently
> + turning off some signals.
> +
> + - which syscall do we use? rt_sigprocmask is slightly more correct,
> + but they're very nearly equivalent and rt_ may not be globally available.
> +
> + I think just rt_sigprocmask. I don't think we support any kernel where
> + that's not present.
> +
> + It's unfortunate there's no simpler solution than duplicating sigmask.
> + */
> +
> + /* it's very important we don't touch errno here--that's TLS and who
> + knows what might happen then! */
> + INTERNAL_SYSCALL_DECL (err);
> +
> + ret = INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_SETMASK, &new, old,
> + _NSIG / 8);
> +
> + assert (ret == 0);
> +}
> +
> +/* Return sigmask to what it was before a call to _dl_mask_all_signals. */
> +void
> +internal_function
> +_dl_unmask_signals (sigset_t *old)
> +{
> + int ret;
> + INTERNAL_SYSCALL_DECL (err);
> +
> + ret = INTERNAL_SYSCALL (rt_sigprocmask, err, 4, SIG_SETMASK, old, NULL,
> + _NSIG / 8);
> +
> + assert (ret == 0);
> +}
> +
> +/* To support accessing TLS variables from signal handlers, we need an
> + async signal safe memory allocator. These routines are never
> + themselves invoked reentrantly (all calls to them are surrounded by
> + signal masks) but may be invoked concurrently from many threads.
> + The current implementation is not particularly performant nor space
> + efficient, but it will be used rarely (and only in binaries that use
> + dlopen.) The API matches that of malloc() and friends. */
> +
> +struct signal_safe_allocator_header {
> + size_t size;
> + void *start;
> +};
> +
> +void * weak_function
> +signal_safe_memalign (size_t boundary, size_t size)
> +{
> + struct signal_safe_allocator_header *header;
> + if (boundary < sizeof (*header))
> + boundary = sizeof (*header);
> +
> + /* power of two */
> + if (boundary & (boundary - 1))
> + return NULL;
> +
> + size_t pg = GLRO(dl_pagesize);
> + size_t padded_size;
> + if (boundary <= pg)
> + {
> + /* easy: we'll get a pointer certainly aligned to boundary, so
> + just add one more boundary-sized chunk to hold the header. */
> + padded_size = roundup (size, boundary) + boundary;
> + }
> + else
> + {
> + /* if we want K pages aligned to a J-page boundary, K+J+1 pages
> + contains at least one such region that isn't directly at the start
> + (so we can place the header.) This is wasteful, but you're the one
> + who wanted 64K-aligned TLS. */
> + padded_size = roundup (size, pg) + boundary + pg;
> + }
> +
> +
> + size_t actual_size = roundup (padded_size, pg);
> + void *actual = mmap (NULL, actual_size, PROT_READ | PROT_WRITE,
> + MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
> + if (actual == MAP_FAILED)
> + return NULL;
> +
> + if (boundary <= pg)
> + {
> + header = actual + boundary - sizeof (*header);
> + }
> + else
> + {
> + intptr_t actual_pg = ((intptr_t)actual) / pg;
> + intptr_t boundary_pg = boundary / pg;
> + intptr_t start_pg = actual_pg + boundary_pg;
> + start_pg -= start_pg % boundary_pg;
> + if (start_pg > (actual_pg + 1))
> + {
> + int ret = munmap (actual, (start_pg - actual_pg - 1) * pg);
> + assert (ret == 0);
> + actual = (void *)((start_pg - 1) * pg);
> + }
> + void *start = (void *)(start_pg * pg);
> + header = start - sizeof(*header);
> +
> + }
> + header->size = actual_size;
> + header->start = actual;
> + void *ptr = header;
> + ptr += sizeof (*header);
> + if (((intptr_t)ptr) % boundary != 0)
> + _dl_fatal_printf ("signal_safe_memalign produced incorrect alignment\n");
> + return ptr;
> +}
> +
> +void * weak_function
> +signal_safe_malloc (size_t size)
> +{
> + return signal_safe_memalign(1, size);
> +}
> +
> +void weak_function
> +signal_safe_free (void *ptr)
> +{
> + if (ptr == NULL)
> + return;
> +
> + struct signal_safe_allocator_header *header = ptr - sizeof (*header);
> + int ret = munmap (header->start, header->size);
> +
> + assert (ret == 0);
> +}
> +
> +void * weak_function
> +signal_safe_realloc (void *ptr, size_t size)
> +{
> + if (size == 0)
> + {
> + signal_safe_free (ptr);
> + return NULL;
> + }
> + if (ptr == NULL)
> + return signal_safe_malloc (size);
> +
> + struct signal_safe_allocator_header *header = ptr - sizeof (*header);
> + size_t old_size = header->size;
> + if (old_size - sizeof (*header) >= size)
> + return ptr;
> +
> + void *new_ptr = signal_safe_malloc (size);
> + if (new_ptr == NULL)
> + return NULL;
> +
> + memcpy (new_ptr, ptr, old_size);
> + signal_safe_free (ptr);
> +
> + return new_ptr;
> +}
> +
> +void * weak_function
> +signal_safe_calloc (size_t nmemb, size_t size)
> +{
> + void *ptr = signal_safe_malloc(nmemb * size);
> + if (ptr == NULL)
> + return NULL;
> + return memset(ptr, 0, nmemb * size);
> +}
> diff --git a/elf/dl-open.c b/elf/dl-open.c
> index 1403c8c..ccecf11 100644
> --- a/elf/dl-open.c
> +++ b/elf/dl-open.c
> @@ -548,7 +548,10 @@ cannot load any more object with static TLS"));
> generation of the DSO we are allocating data for. */
> _dl_update_slotinfo (imap->l_tls_modid);
> #endif
> -
> + /* We do this iteration under a signal mask in dl-reloc; why not
> + here? Because these symbols are new and dlopen hasn't
> + returned yet. So we can't possibly be racing with a TLS
> + access to them from another thread. */
> GL(dl_init_static_tls) (imap);
> assert (imap->l_need_tls_init == 0);
> }
> diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
> index 5c54310..f9493d7 100644
> --- a/elf/dl-reloc.c
> +++ b/elf/dl-reloc.c
> @@ -16,8 +16,10 @@
> License along with the GNU C Library; if not, see
> <http://www.gnu.org/licenses/>. */
>
> +#include <atomic.h>
> #include <errno.h>
> #include <libintl.h>
> +#include <signal.h>
> #include <stdlib.h>
> #include <unistd.h>
> #include <ldsodefs.h>
> @@ -70,8 +72,6 @@ _dl_try_allocate_static_tls (struct link_map *map)
>
> size_t offset = GL(dl_tls_static_used) + (freebytes - n * map->l_tls_align
> - map->l_tls_firstbyte_offset);
> -
> - map->l_tls_offset = GL(dl_tls_static_used) = offset;
> #elif TLS_DTV_AT_TP
> /* dl_tls_static_used includes the TCB at the beginning. */
> size_t offset = (((GL(dl_tls_static_used)
> @@ -83,7 +83,27 @@ _dl_try_allocate_static_tls (struct link_map *map)
> if (used > GL(dl_tls_static_size))
> goto fail;
>
> - map->l_tls_offset = offset;
> +#else
> +# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> +#endif
> + /* We've computed the new value we want, now try to install it. */
> + ptrdiff_t val;
> + while ((val = map->l_tls_offset) == NO_TLS_OFFSET)
> + {
> + atomic_compare_and_exchange_bool_acq(&map->l_tls_offset,
> + offset,
> + NO_TLS_OFFSET);
> + }
> + if (val != offset)
> + {
> + /* Lost a race to a TLS access in another thread. Too bad, nothing
> + we can do here. */
> + goto fail;
> + }
> + /* We installed the value; now update the globals. */
> +#if TLS_TCB_AT_TP
> + GL(dl_tls_static_used) = offset;
> +#elif TLS_DTV_AT_TP
> map->l_tls_firstbyte_offset = GL(dl_tls_static_used);
> GL(dl_tls_static_used) = used;
> #else
> @@ -114,10 +134,20 @@ void
> internal_function __attribute_noinline__
> _dl_allocate_static_tls (struct link_map *map)
> {
> - if (map->l_tls_offset == FORCED_DYNAMIC_TLS_OFFSET
> - || _dl_try_allocate_static_tls (map))
> + /* We wrap this in a signal mask because it has to iterate all
> + threads (including this one) and update this map's TLS entry. A
> + handler accessing TLS would try to do the same update and
> + break. */
> + sigset_t old;
> + _dl_mask_all_signals (&old);
> + int err = -1;
> + if (map->l_tls_offset != FORCED_DYNAMIC_TLS_OFFSET)
> + err = _dl_try_allocate_static_tls (map);
> +
> + _dl_unmask_signals (&old);
> + if (err != 0)
> {
> - _dl_signal_error (0, map->l_name, NULL, N_("\
> + _dl_signal_error (0, map->l_name, NULL, N_("\
> cannot allocate memory in static TLS block"));
> }
> }
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 576d9a1..a286420 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -17,6 +17,7 @@
> <http://www.gnu.org/licenses/>. */
>
> #include <assert.h>
> +#include <atomic.h>
> #include <errno.h>
> #include <libintl.h>
> #include <signal.h>
> @@ -43,7 +44,6 @@ oom (void)
> }
> #endif
>
> -
> size_t
> internal_function
> _dl_next_tls_modid (void)
> @@ -293,7 +293,7 @@ allocate_dtv (void *result)
> initial set of modules. This should avoid in most cases expansions
> of the dtv. */
> dtv_length = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
> - dtv = calloc (dtv_length + 2, sizeof (dtv_t));
> + dtv = signal_safe_calloc (dtv_length + 2, sizeof (dtv_t));
> if (dtv != NULL)
> {
> /* This is the initial length of the dtv. */
> @@ -463,6 +463,18 @@ _dl_allocate_tls (void *mem)
> }
> rtld_hidden_def (_dl_allocate_tls)
>
> +void
> +internal_function
> +_dl_clear_dtv (dtv_t *dtv)
> +{
> + for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
> + if (! dtv[1 + cnt].pointer.is_static
> + && dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> + signal_safe_free (dtv[1 + cnt].pointer.val);
> + memset (dtv, '\0', (dtv[-1].counter + 1) * sizeof (dtv_t));
> +}
> +
> +rtld_hidden_def (_dl_clear_dtv)
>
> #ifndef SHARED
> extern dtv_t _dl_static_dtv[];
> @@ -479,11 +491,11 @@ _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
> for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
> if (! dtv[1 + cnt].pointer.is_static
> && dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> - free (dtv[1 + cnt].pointer.val);
> + signal_safe_free (dtv[1 + cnt].pointer.val);
>
> /* The array starts with dtv[-1]. */
> if (dtv != GL(dl_initial_dtv))
> - free (dtv - 1);
> + signal_safe_free (dtv - 1);
>
> if (dealloc_tcb)
> {
> @@ -521,20 +533,21 @@ rtld_hidden_def (_dl_deallocate_tls)
> # endif
>
>
> -static void *
> -allocate_and_init (struct link_map *map)
> +static void
> +allocate_and_init (dtv_t *dtv, struct link_map *map)
> {
> void *newp;
> -
> - newp = __libc_memalign (map->l_tls_align, map->l_tls_blocksize);
> + newp = signal_safe_memalign (map->l_tls_align, map->l_tls_blocksize);
> if (newp == NULL)
> oom ();
>
> - /* Initialize the memory. */
> + /* Initialize the memory. Since this is our thread's space, we are
> + under a signal mask, and no one has touched this section before,
> + we can safely just overwrite whatever's there. */
> memset (__mempcpy (newp, map->l_tls_initimage, map->l_tls_initimage_size),
> '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
>
> - return newp;
> + dtv->pointer.val = newp;
> }
>
>
> @@ -576,7 +589,15 @@ _dl_update_slotinfo (unsigned long int req_modid)
> the entry we need. */
> size_t new_gen = listp->slotinfo[idx].gen;
> size_t total = 0;
> -
> + int ret;
> + sigset_t old;
> + _dl_mask_all_signals (&old);
> + /* We use the signal mask as a lock against reentrancy here.
> + Check that a signal taken before the lock didn't already
> + update us. */
> + dtv = THREAD_DTV ();
> + if (dtv[0].counter >= listp->slotinfo[idx].gen)
> + goto out;
> /* We have to look through the entire dtv slotinfo list. */
> listp = GL(dl_tls_dtv_slotinfo_list);
> do
> @@ -596,25 +617,27 @@ _dl_update_slotinfo (unsigned long int req_modid)
> if (gen <= dtv[0].counter)
> continue;
>
> + size_t modid = total + cnt;
> +
> /* If there is no map this means the entry is empty. */
> struct link_map *map = listp->slotinfo[cnt].map;
> if (map == NULL)
> {
> /* If this modid was used at some point the memory
> might still be allocated. */
> - if (! dtv[total + cnt].pointer.is_static
> - && dtv[total + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> + if (dtv[-1].counter >= modid
> + && !dtv[modid].pointer.is_static
> + && dtv[modid].pointer.val != TLS_DTV_UNALLOCATED)
> {
> - free (dtv[total + cnt].pointer.val);
> - dtv[total + cnt].pointer.val = TLS_DTV_UNALLOCATED;
> + signal_safe_free (dtv[modid].pointer.val);
> + dtv[modid].pointer.val = TLS_DTV_UNALLOCATED;
> }
>
> continue;
> }
>
> + assert (modid == map->l_tls_modid);
> /* Check whether the current dtv array is large enough. */
> - size_t modid = map->l_tls_modid;
> - assert (total + cnt == modid);
> if (dtv[-1].counter < modid)
> {
> /* Reallocate the dtv. */
> @@ -628,17 +651,17 @@ _dl_update_slotinfo (unsigned long int req_modid)
> {
> /* This is the initial dtv that was allocated
> during rtld startup using the dl-minimal.c
> - malloc instead of the real malloc. We can't
> + malloc instead of the real allocator. We can't
> free it, we have to abandon the old storage. */
>
> - newp = malloc ((2 + newsize) * sizeof (dtv_t));
> + newp = signal_safe_malloc ((2 + newsize) * sizeof (dtv_t));
> if (newp == NULL)
> oom ();
> memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
> }
> else
> {
> - newp = realloc (&dtv[-1],
> + newp = signal_safe_realloc (&dtv[-1],
> (2 + newsize) * sizeof (dtv_t));
> if (newp == NULL)
> oom ();
> @@ -668,7 +691,7 @@ _dl_update_slotinfo (unsigned long int req_modid)
> deallocate even if it is this dtv entry we are
> supposed to load. The reason is that we call
> memalign and not malloc. */
> - free (dtv[modid].pointer.val);
> + signal_safe_free (dtv[modid].pointer.val);
>
> /* This module is loaded dynamically- We defer memory
> allocation. */
> @@ -685,6 +708,8 @@ _dl_update_slotinfo (unsigned long int req_modid)
>
> /* This will be the new maximum generation counter. */
> dtv[0].counter = new_gen;
> + out:
> + _dl_unmask_signals (&old);
> }
>
> return the_map;
> @@ -710,39 +735,50 @@ tls_get_addr_tail (GET_ADDR_ARGS, dtv_t *dtv, struct link_map *the_map)
>
> the_map = listp->slotinfo[idx].map;
> }
> -
> - again:
> - /* Make sure that, if a dlopen running in parallel forces the
> - variable into static storage, we'll wait until the address in the
> - static TLS block is set up, and use that. If we're undecided
> - yet, make sure we make the decision holding the lock as well. */
> - if (__builtin_expect (the_map->l_tls_offset
> - != FORCED_DYNAMIC_TLS_OFFSET, 0))
> + sigset_t old;
> + _dl_mask_all_signals (&old);
> +
> + /* as with update_slotinfo, we use the sigmask as a check against
> + reentrancy. */
> + if (dtv[GET_ADDR_MODULE].pointer.val != TLS_DTV_UNALLOCATED)
> + goto out;
> +
> + /* Synchronize against a parallel dlopen() forcing this variable
> + into static storage. If that happens, we have to be more careful
> + about initializing the area, as that dlopen() will be iterating
> + the threads to do so itself. */
> + ptrdiff_t offset;
> + while ((offset = the_map->l_tls_offset) == NO_TLS_OFFSET)
> {
> - __rtld_lock_lock_recursive (GL(dl_load_lock));
> - if (__builtin_expect (the_map->l_tls_offset == NO_TLS_OFFSET, 1))
> - {
> - the_map->l_tls_offset = FORCED_DYNAMIC_TLS_OFFSET;
> - __rtld_lock_unlock_recursive (GL(dl_load_lock));
> - }
> - else
> - {
> - __rtld_lock_unlock_recursive (GL(dl_load_lock));
> - if (__builtin_expect (the_map->l_tls_offset
> - != FORCED_DYNAMIC_TLS_OFFSET, 1))
> - {
> - void *p = dtv[GET_ADDR_MODULE].pointer.val;
> - if (__builtin_expect (p == TLS_DTV_UNALLOCATED, 0))
> - goto again;
> -
> - return (char *) p + GET_ADDR_OFFSET;
> - }
> - }
> + atomic_compare_and_exchange_bool_acq(&the_map->l_tls_offset,
> + FORCED_DYNAMIC_TLS_OFFSET,
> + NO_TLS_OFFSET);
> + }
> + if (offset == FORCED_DYNAMIC_TLS_OFFSET)
> + {
> + allocate_and_init(&dtv[GET_ADDR_MODULE], the_map);
> }
> - void *p = dtv[GET_ADDR_MODULE].pointer.val = allocate_and_init (the_map);
> - dtv[GET_ADDR_MODULE].pointer.is_static = false;
> + else
> + {
> + void **pp = &dtv[GET_ADDR_MODULE].pointer.val;
> + while (atomic_forced_read (*pp) == TLS_DTV_UNALLOCATED)
> + {
> + /* for lack of a better (safe) thing to do, just spin.
> + Someone else (not us; it's done under a signal mask) set
> + this map to a static TLS offset, and they'll iterate all
> + threads to initialize it. They'll eventually set
> + is_static=true, at which point we know they've fully
> + completed initialization. */
> + atomic_delay ();
> + }
> + /* make sure we've picked up their initialization of the actual block. */
> + atomic_read_barrier ();
> + }
> +out:
> + assert (dtv[GET_ADDR_MODULE].pointer.val != TLS_DTV_UNALLOCATED);
> + _dl_unmask_signals (&old);
>
> - return (char *) p + GET_ADDR_OFFSET;
> + return (char *) dtv[GET_ADDR_MODULE].pointer.val + GET_ADDR_OFFSET;
> }
>
>
> diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
> index 1e0fe1f..e8709d1 100644
> --- a/nptl/allocatestack.c
> +++ b/nptl/allocatestack.c
> @@ -242,11 +242,7 @@ get_cached_stack (size_t *sizep, void **memp)
>
> /* Clear the DTV. */
> dtv_t *dtv = GET_DTV (TLS_TPADJ (result));
> - for (size_t cnt = 0; cnt < dtv[-1].counter; ++cnt)
> - if (! dtv[1 + cnt].pointer.is_static
> - && dtv[1 + cnt].pointer.val != TLS_DTV_UNALLOCATED)
> - free (dtv[1 + cnt].pointer.val);
> - memset (dtv, '\0', (dtv[-1].counter + 1) * sizeof (dtv_t));
> + _dl_clear_dtv (dtv);
>
> /* Re-initialize the TLS. */
> _dl_allocate_tls_init (TLS_TPADJ (result));
> @@ -1177,13 +1173,15 @@ init_one_static_tls (struct pthread *curp, struct link_map *map)
> # error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
> # endif
>
> - /* Fill in the DTV slot so that a later LD/GD access will find it. */
> - dtv[map->l_tls_modid].pointer.val = dest;
> - dtv[map->l_tls_modid].pointer.is_static = true;
> -
> /* Initialize the memory. */
> memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
> '\0', map->l_tls_blocksize - map->l_tls_initimage_size);
> +
> + /* Fill in the DTV slot so that a later LD/GD access will find it. */
> + dtv[map->l_tls_modid].pointer.is_static = true;
> + atomic_write_barrier();
> + dtv[map->l_tls_modid].pointer.val = dest;
> +
> }
>
> void
> diff --git a/sysdeps/generic/ldsodefs.h b/sysdeps/generic/ldsodefs.h
> index e7b0516..14c8ddc 100644
> --- a/sysdeps/generic/ldsodefs.h
> +++ b/sysdeps/generic/ldsodefs.h
> @@ -233,6 +233,11 @@ extern int _dl_name_match_p (const char *__name, const struct link_map *__map)
> extern unsigned long int _dl_higher_prime_number (unsigned long int n)
> internal_function;
>
> +/* mask every signal, returning the previous sigmask in <old>. */
> +extern void _dl_mask_all_signals (sigset_t *old) internal_function;
> +/* undo _dl_mask_all_signals */
> +extern void _dl_unmask_signals (sigset_t *old) internal_function;
> +
> /* Function used as argument for `_dl_receive_error' function. The
> arguments are the error code, error string, and the objname the
> error occurred in. */
> @@ -983,6 +988,18 @@ extern void *_dl_allocate_tls_storage (void)
> extern void *_dl_allocate_tls_init (void *) internal_function;
> rtld_hidden_proto (_dl_allocate_tls_init)
>
> +extern void *signal_safe_memalign (size_t boundary, size_t size);
> +extern void *signal_safe_malloc (size_t size);
> +extern void signal_safe_free (void *ptr);
> +extern void *signal_safe_realloc (void *ptr, size_t size);
> +extern void *signal_safe_calloc (size_t nmemb, size_t size);
> +
> +
> +/* remove all allocated dynamic TLS regions from a DTV
> + for reuse by new thread */
> +extern void _dl_clear_dtv (dtv_t *dtv) internal_function;
> +rtld_hidden_proto (_dl_clear_dtv)
> +
> /* Deallocate memory allocated with _dl_allocate_tls. */
> extern void _dl_deallocate_tls (void *tcb, bool dealloc_tcb) internal_function;
> rtld_hidden_proto (_dl_deallocate_tls)
> --
> 1.8.4
>