This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
[PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen
- From: Szabolcs Nagy <szabolcs dot nagy at arm dot com>
- To: GNU C Library <libc-alpha at sourceware dot org>
- Cc: <i dot palachev at samsung dot com>, "triegel at redhat dot com" <triegel at redhat dot com>, <nd at arm dot com>
- Date: Wed, 6 Jan 2016 18:33:53 +0000
- Subject: [PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen
- Authentication-results: sourceware.org; auth=none
- Nodisclaimer: True
- Spamdiagnosticmetadata: NSPM
- Spamdiagnosticoutput: 1:23
I've seen the following failure:
Inconsistency detected by ld.so: dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <=
_rtld_local._dl_tls_generation' failed!
It seems dlopen modifies tls related dynamic linker data structures in
dl_open_worker if a shared library is loaded with tls, while holding
the GL(dl_load_lock).
Meanwhile at thread creation the same globals are accessed when the dtv
and tls is set up for the new thread in _dl_allocate_tls_init without
holding any locks.
At least the following objects may have conflicting access:
GL(dl_tls_max_dtv_idx)
GL(dl_tls_generation)
listp->slotinfo[i].map
listp->slotinfo[i].gen
listp->next
The race window that triggers the assert failure above is short compared
to dlopen and thread creation, so it rarely happens, but the probability
can be increased by loading a long chain of dependent shared objects
with tls. Then all the loaded dsos get the same generation number,
GL(dl_tls_generation)+1, in the slotinfo list and GL(dl_tls_generation)
gets incremented only after that, so the assertion can fail in a
concurrently created thread reading the new slotinfos.
My fix loads the current number of dtv slots, GL(dl_tls_max_dtv_idx),
and the current generation counter, GL(dl_tls_generation), at the
beginning of _dl_allocate_tls_init and then ignores any slotinfo
entry that got concurrently added. So instead of taking the
GL(dl_load_lock), used atomics to avoid the races.
I think various other accesses to the objects in the list above should
be atomic as well, but that rabbit hole is deeper than what a quick
fix could patch up.
I can trigger the bug on x86_64 by loading a chain of 50 dsos and
concurrently creating >1000 threads, but i cannot easily turn that
into a glibc test case. (On aarch64 the failure happens to be more
common.)
A related issue is that glibc does not try to do worst-case allocation
of tls for existing threads whenever a library is loaded so there might
be further allocation needed at first tls access making it non-as-safe
and possibly oom crash. This lazy allocation allows me to ignore the
slotinfo of concurrently loaded dsos in _dl_allocate_tls_init, those
will be lazy initialized.
Changelog:
2016-01-06 Szabolcs Nagy <szabolcs.nagy@arm.com>
[BZ #19329]
* elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) atomically.
* elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation),
GL(dl_tls_max_dtv_idx) and slotinfo entries atomically.
(_dl_add_to_slotinfo): Write the slotinfo entry atomically.
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 5429d18..4e2cd6a 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -524,9 +524,15 @@ dl_open_worker (void *a)
}
/* Bump the generation number if necessary. */
- if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
- _dl_fatal_printf (N_("\
+ if (any_tls)
+ {
+ size_t newgen = GL(dl_tls_generation) + 1;
+ if (__builtin_expect (newgen == 0, 0))
+ _dl_fatal_printf (N_("\
TLS generation counter wrapped! Please report this."));
+ /* Synchronize with the load acquire in _dl_allocate_tls_init. */
+ atomic_store_release (&GL(dl_tls_generation), newgen);
+ }
/* We need a second pass for static tls data, because _dl_update_slotinfo
must not be run while calls to _dl_add_to_slotinfo are still pending. */
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 20c7e33..3c08a5f 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -455,9 +455,12 @@ _dl_allocate_tls_init (void *result)
struct dtv_slotinfo_list *listp;
size_t total = 0;
size_t maxgen = 0;
+ size_t gen_count;
+ size_t dtv_slots;
/* Check if the current dtv is big enough. */
- if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+ dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
+ if (dtv[-1].counter < dtv_slots)
{
/* Resize the dtv. */
dtv = _dl_resize_dtv (dtv);
@@ -470,6 +473,7 @@ _dl_allocate_tls_init (void *result)
TLS. For those which are dynamically loaded we add the values
indicating deferred allocation. */
listp = GL(dl_tls_dtv_slotinfo_list);
+ gen_count = atomic_load_acquire (&GL(dl_tls_generation));
while (1)
{
size_t cnt;
@@ -480,18 +484,22 @@ _dl_allocate_tls_init (void *result)
void *dest;
/* Check for the total number of used slots. */
- if (total + cnt > GL(dl_tls_max_dtv_idx))
+ if (total + cnt > dtv_slots)
break;
- map = listp->slotinfo[cnt].map;
+ map = atomic_load_acquire (&listp->slotinfo[cnt].map);
if (map == NULL)
/* Unused entry. */
continue;
+ size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen);
+ if (gen > gen_count)
+ /* New, concurrently loaded entry. */
+ continue;
+
/* Keep track of the maximum generation number. This might
not be the generation counter. */
- assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
- maxgen = MAX (maxgen, listp->slotinfo[cnt].gen);
+ maxgen = MAX (maxgen, gen);
dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED;
dtv[map->l_tls_modid].pointer.is_static = false;
@@ -518,7 +526,7 @@ _dl_allocate_tls_init (void *result)
}
total += cnt;
- if (total >= GL(dl_tls_max_dtv_idx))
+ if (total >= dtv_slots)
break;
listp = listp->next;
@@ -942,6 +950,7 @@ cannot create TLS data structures"));
}
/* Add the information into the slotinfo data structure. */
- listp->slotinfo[idx].map = l;
- listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+ atomic_store_relaxed (&listp->slotinfo[idx].gen, GL(dl_tls_generation) + 1);
+ /* Synchronize with the acquire load in _dl_allocate_tls_init. */
+ atomic_store_release (&listp->slotinfo[idx].map, l);
}