[PR19826] fix non-LE TLS in static programs
Alexandre Oliva
aoliva@redhat.com
Sat Sep 24 03:44:00 GMT 2016
On Sep 21, 2016, Andreas Schwab <schwab@suse.de> wrote:
> On Sep 21 2016, Alexandre Oliva <aoliva@redhat.com> wrote:
>> [BZ #19826]
>> * elf/dl-tls.c (_dl_allocate_tls_init): Restore DTV early
>> initialization of static TLS entries.
>> * elf/dl-reloc.c (_dl_nothread_init_static_tls): Likewise.
>> * nptl/allocatestack.c (init_one_static_tls): Likewise.
> Ok.
No, it's not! :-)
:-(
We'll need something along these lines, to fix the above, and to avoid
making the same mistake again.
I'm yet to test it myself, but I understand you'd already tested on ARM
the combination of the previous patch and this reversal, minus the
comments added below, i.e., the elf/dl-tls.c changes above. Is that so?
Once I'm done with the testing, is this ok to install on top of the
previous patch?
[PR19826] revert needlessly reintroduced race, explain it
Along with the proper fix for the problem, a couple of unnecessary DTV
initializations were reintroduced, but they weren't just unnecessary:
one of them introduced a race condition that was alluded to in the
comments.
In this patch, I add a long comment explaining why we can't modify
another threads' DTV while initializing its static TLS block during
dlopen.
I also revert the unnecessary parts of the previous change, removing
the DTV-initializing code from the above, to fix the race, and, for
uniformity, also from the corresponding code that runs in
single-threaded programs, that was harmless as far as I can tell.
for ChangeLog
[BZ #19826]
* nptl/allocatestack.c (init_one_static_tls): Revert the
previous change and explain the race condition it would
introduce.
* elf/dl-reloc.c (_dl_nothread_init_static_tls): Revert
previous change for uniformity.
---
elf/dl-reloc.c | 6 ------
nptl/allocatestack.c | 36 ++++++++++++++++++++++++++++++------
2 files changed, 30 insertions(+), 12 deletions(-)
diff --git a/elf/dl-reloc.c b/elf/dl-reloc.c
index dcab666..42bddc1 100644
--- a/elf/dl-reloc.c
+++ b/elf/dl-reloc.c
@@ -137,12 +137,6 @@ _dl_nothread_init_static_tls (struct link_map *map)
# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
#endif
- /* Fill in the DTV slot so that a later LD/GD access will find it. */
- dtv_t *dtv = THREAD_DTV ();
- assert (map->l_tls_modid <= dtv[-1].counter);
- dtv[map->l_tls_modid].pointer.to_free = NULL;
- dtv[map->l_tls_modid].pointer.val = dest;
-
/* Initialize the memory. */
memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
'\0', map->l_tls_blocksize - map->l_tls_initimage_size);
diff --git a/nptl/allocatestack.c b/nptl/allocatestack.c
index 3016a2e..38b6530 100644
--- a/nptl/allocatestack.c
+++ b/nptl/allocatestack.c
@@ -1207,12 +1207,36 @@ init_one_static_tls (struct pthread *curp, struct link_map *map)
# error "Either TLS_TCB_AT_TP or TLS_DTV_AT_TP must be defined"
# endif
- /* Fill in the DTV slot so that a later LD/GD access will find it. */
- dtv_t *dtv = GET_DTV (TLS_TPADJ (curp));
- dtv[map->l_tls_modid].pointer.to_free = NULL;
- dtv[map->l_tls_modid].pointer.val = dest;
-
- /* Initialize the memory. */
+ /* We cannot delay the initialization of the Static TLS area, since
+ it can be accessed with LE or IE, but since the DTV is only used
+ by GD and LD (*) we can delay its update to avoid a race (**).
+
+ (*) The main program's DTV of statically-linked programs may be
+ used by a custom __tls_get_addr used on platforms that don't
+ require TLS relaxations, but that one is never initialized by us,
+ since the main program is never dlopened; the main thread's DTV
+ entry for the main program gets initialized in __libc_setup_tls,
+ whereas other threads' corresponding DTV entries are initialized
+ by _dl_allocate_tls_init.
+
+ (**) If we were to write to other threads' DTV entries here, we
+ might race with their DTV resizing. We might then write to
+ stale, possibly already free()d and reallocated memory. Or we
+ could write past the end of the still-active DTV. So, don't do
+ that!
+
+ Another race, not to be mistaken for the above, has to do with
+ the TLS block initialized below. Although we perform this
+ initialization while holding the rtld lock, once dlopen
+ completes, other threads might run code that accesses the
+ variables in these TLS blocks, without taking the rtld lock or
+ any other explicit memory acquire.
+
+ Our implementation appears to assume that, in order for the other
+ threads to access those variables, there will be some form of
+ synchronization between the end of dlopen() in one thread and the
+ use of the just-loaded modules in others, otherwise the very use
+ of the loaded code could be undefined. */
memset (__mempcpy (dest, map->l_tls_initimage, map->l_tls_initimage_size),
'\0', map->l_tls_blocksize - map->l_tls_initimage_size);
}
--
Alexandre Oliva, freedom fighter http://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/ FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer
More information about the Libc-alpha
mailing list