This is the mail archive of the mailing list for the glibc project.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

[PATCH][BZ #19329] Fix race between tls allocation at thread creation and dlopen

I've seen the following failure:

Inconsistency detected by dl-tls.c: 493: _dl_allocate_tls_init: Assertion `listp->slotinfo[cnt].gen <= _rtld_local._dl_tls_generation' failed!

It seems dlopen modifies tls related dynamic linker data structures in
dl_open_worker if a shared library is loaded with tls, while holding
the GL(dl_load_lock).

Meanwhile at thread creation the same globals are accessed when the dtv
and tls is set up for the new thread in _dl_allocate_tls_init without
holding any locks.

At least the following objects may have conflicting access:


The race window that triggers the assert failure above is short compared
to dlopen and thread creation, so it rarely happens, but the probability
can be increased by loading a long chain of dependent shared objects
with tls.  Then all the loaded dsos get the same generation number,
GL(dl_tls_generation)+1, in the slotinfo list and GL(dl_tls_generation)
gets incremented only after that, so the assertion can fail in a
concurrently created thread reading the new slotinfos.

My fix loads the current number of dtv slots, GL(dl_tls_max_dtv_idx),
and the current generation counter, GL(dl_tls_generation), at the
beginning of _dl_allocate_tls_init and then ignores any slotinfo
entry that got concurrently added.  So instead of taking the
GL(dl_load_lock), used atomics to avoid the races.

I think various other accesses to the objects in the list above should
be atomic as well, but that rabbit hole is deeper than what a quick
fix could patch up.

I can trigger the bug on x86_64 by loading a chain of 50 dsos and
concurrently creating >1000 threads, but i cannot easily turn that
into a glibc test case.  (On aarch64 the failure happens to be more

A related issue is that glibc does not try to do worst-case allocation
of tls for existing threads whenever a library is loaded so there might
be further allocation needed at first tls access making it non-as-safe
and possibly oom crash.  This lazy allocation allows me to ignore the
slotinfo of concurrently loaded dsos in _dl_allocate_tls_init, those
will be lazy initialized.


2016-01-06  Szabolcs Nagy  <>

	[BZ #19329]
	* elf/dl-open.c (dl_open_worker): Write GL(dl_tls_generation) atomically.
	* elf/dl-tls.c (_dl_allocate_tls_init): Read GL(dl_tls_generation),
	GL(dl_tls_max_dtv_idx) and slotinfo entries atomically.
	(_dl_add_to_slotinfo): Write the slotinfo entry atomically.
diff --git a/elf/dl-open.c b/elf/dl-open.c
index 5429d18..4e2cd6a 100644
--- a/elf/dl-open.c
+++ b/elf/dl-open.c
@@ -524,9 +524,15 @@ dl_open_worker (void *a)
   /* Bump the generation number if necessary.  */
-  if (any_tls && __builtin_expect (++GL(dl_tls_generation) == 0, 0))
-    _dl_fatal_printf (N_("\
+  if (any_tls)
+    {
+      size_t newgen = GL(dl_tls_generation) + 1;
+      if (__builtin_expect (newgen == 0, 0))
+	_dl_fatal_printf (N_("\
 TLS generation counter wrapped!  Please report this."));
+      /* Synchronize with the load acquire in _dl_allocate_tls_init.  */
+      atomic_store_release (&GL(dl_tls_generation), newgen);
+    }
   /* We need a second pass for static tls data, because _dl_update_slotinfo
      must not be run while calls to _dl_add_to_slotinfo are still pending.  */
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 20c7e33..3c08a5f 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -455,9 +455,12 @@ _dl_allocate_tls_init (void *result)
   struct dtv_slotinfo_list *listp;
   size_t total = 0;
   size_t maxgen = 0;
+  size_t gen_count;
+  size_t dtv_slots;
   /* Check if the current dtv is big enough.   */
-  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+  dtv_slots = atomic_load_relaxed (&GL(dl_tls_max_dtv_idx));
+  if (dtv[-1].counter < dtv_slots)
       /* Resize the dtv.  */
       dtv = _dl_resize_dtv (dtv);
@@ -470,6 +473,7 @@ _dl_allocate_tls_init (void *result)
      TLS.  For those which are dynamically loaded we add the values
      indicating deferred allocation.  */
   listp = GL(dl_tls_dtv_slotinfo_list);
+  gen_count = atomic_load_acquire (&GL(dl_tls_generation));
   while (1)
       size_t cnt;
@@ -480,18 +484,22 @@ _dl_allocate_tls_init (void *result)
 	  void *dest;
 	  /* Check for the total number of used slots.  */
-	  if (total + cnt > GL(dl_tls_max_dtv_idx))
+	  if (total + cnt > dtv_slots)
-	  map = listp->slotinfo[cnt].map;
+	  map = atomic_load_acquire (&listp->slotinfo[cnt].map);
 	  if (map == NULL)
 	    /* Unused entry.  */
+	  size_t gen = atomic_load_relaxed (&listp->slotinfo[cnt].gen);
+	  if (gen > gen_count)
+	    /* New, concurrently loaded entry.  */
+	    continue;
 	  /* Keep track of the maximum generation number.  This might
 	     not be the generation counter.  */
-	  assert (listp->slotinfo[cnt].gen <= GL(dl_tls_generation));
-	  maxgen = MAX (maxgen, listp->slotinfo[cnt].gen);
+	  maxgen = MAX (maxgen, gen);
 	  dtv[map->l_tls_modid].pointer.val = TLS_DTV_UNALLOCATED;
 	  dtv[map->l_tls_modid].pointer.is_static = false;
@@ -518,7 +526,7 @@ _dl_allocate_tls_init (void *result)
       total += cnt;
-      if (total >= GL(dl_tls_max_dtv_idx))
+      if (total >= dtv_slots)
       listp = listp->next;
@@ -942,6 +950,7 @@ cannot create TLS data structures"));
   /* Add the information into the slotinfo data structure.  */
-  listp->slotinfo[idx].map = l;
-  listp->slotinfo[idx].gen = GL(dl_tls_generation) + 1;
+  atomic_store_relaxed (&listp->slotinfo[idx].gen, GL(dl_tls_generation) + 1);
+  /* Synchronize with the acquire load in _dl_allocate_tls_init.  */
+  atomic_store_release (&listp->slotinfo[idx].map, l);

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]