This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH][BZ #13862] Reuse of cached stack can cause bounds overrun of thread DTV
- From: "H.J. Lu" <hjl dot tools at gmail dot com>
- To: Carlos O'Donell <carlos at redhat dot com>, Joseph Myers <joseph at codesourcery dot com>, Paul Archard <paul dot archard at proceranetworks dot com>, Ond??ej B??lka <neleai at seznam dot cz>, MyungJoo Ham <myungjoo dot ham at samsung dot com>, GNU C Library <libc-alpha at sourceware dot org>, Torvald Riegel <triegel at redhat dot com>
- Date: Thu, 27 Nov 2014 12:04:50 -0800
- Subject: Re: [PATCH][BZ #13862] Reuse of cached stack can cause bounds overrun of thread DTV
- Authentication-results: sourceware.org; auth=none
- References: <00c301cf21d3$ca7253e0$5f56fba0$ at proceranetworks dot com> <CAMe9rOpjQdvOQmcJ40URrLwNZDCqK+TV5s=Uf+ozsOB-7tpBFQ at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1411261622420 dot 854 at digraph dot polyomino dot org dot uk> <CAMe9rOrK-sdgGwKHX2HxBiO=6nFeo9mXzj6ue79qe5eOYDiVAA at mail dot gmail dot com> <alpine dot DEB dot 2 dot 10 dot 1411261707030 dot 854 at digraph dot polyomino dot org dot uk> <CAMe9rOr8vebftVYEMXxf9g9QZqLvBc20QxF8NyhNYka8g_ESjQ at mail dot gmail dot com> <CAMe9rOoxN23hxzf330_QHRBSRw0akj3GtLAEdV5ctnsOXTVGog at mail dot gmail dot com> <20141127140654 dot GA31381 at gmail dot com> <5477512E dot 1080403 at redhat dot com> <CAMe9rOrHGyvC9u2f=iWWUubiQ8-MHph9mKHmZ8_97A_5uJewEg at mail dot gmail dot com>
On Thu, Nov 27, 2014 at 09:45:11AM -0800, H.J. Lu wrote:
> On Thu, Nov 27, 2014 at 8:28 AM, Carlos O'Donell <carlos@redhat.com> wrote:
> > On 11/27/2014 09:06 AM, H.J. Lu wrote:
> >> Here is the promised patch. OK for trunk?
> >
> > Not yet.
> >
> > My personal opinion is that there has been insufficient detailed
> > analysis posted for this patch. For this kind of change I want to
> > see the author understand the code paths in question and how the
> > call flow gets to the position of failure. Paul has given a hand-waving
> > example of the failure without much detail. I would like to see
> > sufficient detail like:
> > ---
> > When calling pthread_create we call allocate_stack which then reuses
> > a stack by finding one via get_cached_stack. In get_cached_stack we
> > select an appropriately sized stack and in that process get the old
> > dtv from the thread, which could be reused if it was the right size.
> > Then _dl_allocate_tls_init is called to reinitialize the TLS structures
> > for the thread. In _dl_allocate_tls_init the expectation that is violated
> > is that dtv should have been calloc'd to a size of GL(dl_tls_max_dtv_idx)
> > + DTV_SURPLUS; by allocate_dtv. While this is true, it was done with an
> > old value of dl_tls_max_dtv_idx and that value might have changed since
> > the stack was last released.
> > ---
> > This is just an example, you would have to verify that this is actually
> > the case.
>
> That is the case.
>
> > Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another
> > thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than
> > once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx)
> > require an atomic load? We are trying to be explicit about this in glibc.
>
> There is a race between DTV access and dlclose:
>
> https://sourceware.org/ml/libc-alpha/2014-11/msg00469.html
>
> > Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup
> > does it make sense to reallocate the dtv? Why not leave it uninitialized
> > and allow _dl_update_slotinfo to reallocate it?
>
> Are you suggesting folding _dl_allocate_tls_init into _dl_update_slotinfo
> and remove _dl_allocate_tls_init? I can give it a try.
>
Many glibc tests failed with it.
> > Q: If you don't want to delay the dtv reallocation to _dl_update_slotinfo,
> > why don't you refactor the reallocation of the dtv into a single function
> > to avoid duplication with the same code in _dl_update_slotinfo?
>
> I will do that.
Done.
> >> + size_t oldsize = dtv[-1].counter;
> >> +
> >> + if (dtv == GL(dl_initial_dtv))
> >> + {
> >> + /* This is the initial dtv that was allocated
> >> + during rtld startup using the dl-minimal.c
> >> + malloc instead of the real malloc. We can't
> >> + free it, we have to abandon the old storage. */
> >> +
> >> + newp = malloc ((2 + newsize) * sizeof (dtv_t));
> >> + if (newp == NULL)
> >> + oom ();
> >> + memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
> >> + }
> >> + else
> >> + {
> >> + newp = realloc (&dtv[-1],
> >> + (2 + newsize) * sizeof (dtv_t));
> >> + if (newp == NULL)
> >> + oom ();
> >> + }
> >> +
> >> + newp[0].counter = newsize;
> >> +
> >> + /* Clear the newly allocated part. */
> >> + memset (newp + 2 + oldsize, '\0',
> >> + (newsize - oldsize) * sizeof (dtv_t));
> >> +
> >> + /* Install this new dtv in the thread data structures. */
> >> + INSTALL_DTV (result, newp);
> >> +
> >> + /* Point dtv to the generation counter. */
> >> + dtv = &newp[1];
> >> + }
> >> +
> >
> > The code itself is OK, but it's a copy of what _dl_update_slotinfo
> > does. Please refactor into a static inline function e.g. resize_dtv.
> >
Done.
> > Missing first line explaining test.
>
> I will add it.
>
Done.
> > Missing license.
>
> What is the license policy on testcases?
I left out license. Not all testcases have license. I can add it if
it is the only blocking issuse.
>
> >> +#include <stdio.h>
> >> +#include <stdint.h>
> >> +#include <dlfcn.h>
> >> +#include <assert.h>
> >> +#include <pthread.h>
> >> +
> >> +#define DSO_SHARED_FILES 20
> >> +#define DSO_OPEN_THREADS 20
> >> +#define DSO_EXEC_THREADS 2
> >
> > Why these particular constants? If you chose them because
> > they happened to work on a particular system, please comment
> > that.
>
> They came from:
>
> https://sourceware.org/bugzilla/attachment.cgi?id=6290
>
> >> +
> >> +pthread_mutex_t g_lock;
> >> +
> >
> > Describe what the mutex protects in a comment.
>
> I will add a few words.
Done.
> >
> > Use /* */.
>
> We have been using // for comments in glibc for a while. Why not here?
I kept //.
> >> diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
> >> new file mode 100644
> >> index 0000000..b8e800f
> >> --- /dev/null
> >> +++ b/nptl/tst-stack4mod.c
> >> @@ -0,0 +1,9 @@
> >
> > Please give a one line description at the top of this file.
> >
Done.
>
> I don't have answers for any particular values in this
> testcase.
It doesn't matter how those values are chosen. The test fails without
my fix. Here is the updated patch. OK for trunk?
H.J.
---
>From b2d53cf4c7bc1fc7e5d92bfe783e8c03439c50bb Mon Sep 17 00:00:00 2001
From: "H.J. Lu" <hjl.tools@gmail.com>
Date: Thu, 27 Nov 2014 05:42:23 -0800
Subject: [PATCH] Resize DTV if the current DTV isn't big enough
This patch changes _dl_allocate_tls_init to resize DTV if the current DTV
isn't big enough. Tested on X86-64, x32 and ia32.
[BZ #13862]
* elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
(_dl_static_dtv, _dl_initial_dtv): Moved before ...
(_dl_resize_dtv): This. Extracted from _dl_update_slotinfo.
(_dl_allocate_tls_init): Resize DTV if the current DTV isn't
big enough.
(_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
* nptl/Makefile (tests): Add tst-stack4.
(modules-names): Add tst-stack4mod.
($(objpfx)tst-stack4): New.
(tst-stack4mod.sos): Likewise.
($(objpfx)tst-stack4.out): Likewise.
($(tst-stack4mod.sos)): Likewise.
(clean): Likewise.
* nptl/tst-stack4.c: New file.
* nptl/tst-stack4mod.c: Likewise.
---
ChangeLog | 19 +++++++
elf/dl-tls.c | 98 +++++++++++++++++++++---------------
nptl/Makefile | 17 ++++++-
nptl/tst-stack4.c | 138 +++++++++++++++++++++++++++++++++++++++++++++++++++
nptl/tst-stack4mod.c | 11 ++++
5 files changed, 240 insertions(+), 43 deletions(-)
create mode 100644 nptl/tst-stack4.c
create mode 100644 nptl/tst-stack4mod.c
diff --git a/ChangeLog b/ChangeLog
index be49dba..9f55fb2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,22 @@
+2014-11-27 H.J. Lu <hongjiu.lu@intel.com>
+
+ [BZ #13862]
+ * elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
+ (_dl_static_dtv, _dl_initial_dtv): Moved before ...
+ (_dl_resize_dtv): This. Extracted from _dl_update_slotinfo.
+ (_dl_allocate_tls_init): Resize DTV if the current DTV isn't
+ big enough.
+ (_dl_update_slotinfo): Call _dl_resize_dtv to resize DTV.
+ * nptl/Makefile (tests): Add tst-stack4.
+ (modules-names): Add tst-stack4mod.
+ ($(objpfx)tst-stack4): New.
+ (tst-stack4mod.sos): Likewise.
+ ($(objpfx)tst-stack4.out): Likewise.
+ ($(tst-stack4mod.sos)): Likewise.
+ (clean): Likewise.
+ * nptl/tst-stack4.c: New file.
+ * nptl/tst-stack4mod.c: Likewise.
+
2014-11-27 J. Brown <jb999@gmx.de>
* sysdeps/x86/bits/string.h: Add recent CPUs.
diff --git a/elf/dl-tls.c b/elf/dl-tls.c
index 5204fda..3366a36 100644
--- a/elf/dl-tls.c
+++ b/elf/dl-tls.c
@@ -34,14 +34,12 @@
/* Out-of-memory handler. */
-#ifdef SHARED
static void
__attribute__ ((__noreturn__))
oom (void)
{
_dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
}
-#endif
size_t
@@ -397,6 +395,50 @@ _dl_allocate_tls_storage (void)
}
+#ifndef SHARED
+extern dtv_t _dl_static_dtv[];
+# define _dl_initial_dtv (&_dl_static_dtv[1])
+#endif
+
+static dtv_t *
+_dl_resize_dtv (dtv_t *dtv)
+{
+ /* Resize the dtv. */
+ dtv_t *newp;
+ size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
+ size_t oldsize = dtv[-1].counter;
+
+ if (dtv == GL(dl_initial_dtv))
+ {
+ /* This is the initial dtv that was either statically allocated in
+ __libc_setup_tls or allocated during rtld startup using the
+ dl-minimal.c malloc instead of the real malloc. We can't free
+ it, we have to abandon the old storage. */
+
+ newp = malloc ((2 + newsize) * sizeof (dtv_t));
+ if (newp == NULL)
+ oom ();
+ memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
+ }
+ else
+ {
+ newp = realloc (&dtv[-1],
+ (2 + newsize) * sizeof (dtv_t));
+ if (newp == NULL)
+ oom ();
+ }
+
+ newp[0].counter = newsize;
+
+ /* Clear the newly allocated part. */
+ memset (newp + 2 + oldsize, '\0',
+ (newsize - oldsize) * sizeof (dtv_t));
+
+ /* Return the generation counter. */
+ return &newp[1];
+}
+
+
void *
internal_function
_dl_allocate_tls_init (void *result)
@@ -410,6 +452,16 @@ _dl_allocate_tls_init (void *result)
size_t total = 0;
size_t maxgen = 0;
+ /* Check if the current dtv is big enough. */
+ if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
+ {
+ /* Resize the dtv. */
+ dtv = _dl_resize_dtv (dtv);
+
+ /* Install this new dtv in the thread data structures. */
+ INSTALL_DTV (result, &dtv[-1]);
+ }
+
/* We have to prepare the dtv for all currently loaded modules using
TLS. For those which are dynamically loaded we add the values
indicating deferred allocation. */
@@ -492,11 +544,6 @@ _dl_allocate_tls (void *mem)
rtld_hidden_def (_dl_allocate_tls)
-#ifndef SHARED
-extern dtv_t _dl_static_dtv[];
-# define _dl_initial_dtv (&_dl_static_dtv[1])
-#endif
-
void
internal_function
_dl_deallocate_tls (void *tcb, bool dealloc_tcb)
@@ -645,41 +692,10 @@ _dl_update_slotinfo (unsigned long int req_modid)
assert (total + cnt == modid);
if (dtv[-1].counter < modid)
{
- /* Reallocate the dtv. */
- dtv_t *newp;
- size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;
- size_t oldsize = dtv[-1].counter;
-
- assert (map->l_tls_modid <= newsize);
-
- if (dtv == GL(dl_initial_dtv))
- {
- /* This is the initial dtv that was allocated
- during rtld startup using the dl-minimal.c
- malloc instead of the real malloc. We can't
- free it, we have to abandon the old storage. */
-
- newp = malloc ((2 + newsize) * sizeof (dtv_t));
- if (newp == NULL)
- oom ();
- memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
- }
- else
- {
- newp = realloc (&dtv[-1],
- (2 + newsize) * sizeof (dtv_t));
- if (newp == NULL)
- oom ();
- }
-
- newp[0].counter = newsize;
-
- /* Clear the newly allocated part. */
- memset (newp + 2 + oldsize, '\0',
- (newsize - oldsize) * sizeof (dtv_t));
+ /* Resize the dtv. */
+ dtv = _dl_resize_dtv (dtv);
- /* Point dtv to the generation counter. */
- dtv = &newp[1];
+ assert (modid <= dtv[-1].counter);
/* Install this new dtv in the thread data
structures. */
diff --git a/nptl/Makefile b/nptl/Makefile
index 86a1b0c..ac76596 100644
--- a/nptl/Makefile
+++ b/nptl/Makefile
@@ -254,7 +254,7 @@ tests = tst-typesizes \
tst-exec1 tst-exec2 tst-exec3 tst-exec4 \
tst-exit1 tst-exit2 tst-exit3 \
tst-stdio1 tst-stdio2 \
- tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \
+ tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \
tst-pthread-attr-affinity \
tst-unload \
tst-dlsym1 \
@@ -311,7 +311,7 @@ endif
modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
- tst-tls5modd tst-tls5mode tst-tls5modf \
+ tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \
tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
test-extras += $(modules-names) tst-cleanup4aux
@@ -479,6 +479,19 @@ $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
$(evaluate-test)
generated += tst-stack3-mem.out tst-stack3.mtrace
+$(objpfx)tst-stack4: $(libdl) $(shared-thread-library)
+tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \
+ 11 12 13 14 15 16 17 18 19; do \
+ for j in 0 1 2 3 4 5 6 7 8 9 10 \
+ 11 12 13 14 15 16 17 18 19; do \
+ echo $(objpfx)tst-stack4mod-$$i-$$j.so; \
+ done; done)
+$(objpfx)tst-stack4.out: $(tst-stack4mod.sos)
+$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so
+ cp -f $< $@
+clean:
+ rm -f $(tst-stack4mod.sos)
+
$(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
$(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c
new file mode 100644
index 0000000..0fd2116
--- /dev/null
+++ b/nptl/tst-stack4.c
@@ -0,0 +1,138 @@
+// Test DTV size overflow with pthread_create and TLS in dlopened shared
+// object.
+
+#include <stdio.h>
+#include <stdint.h>
+#include <dlfcn.h>
+#include <assert.h>
+#include <pthread.h>
+
+#define DSO_SHARED_FILES 20
+#define DSO_OPEN_THREADS 20
+#define DSO_EXEC_THREADS 2
+
+// Used to make sure that only one thread is calling dlopen and dlclose
+// at a time.
+pthread_mutex_t g_lock;
+
+typedef void (*function) (void);
+
+void *
+dso_invoke(void *dso_fun)
+{
+ function *fun_vec = (function *) dso_fun;
+ int dso;
+
+ for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+ (*fun_vec[dso]) ();
+
+ pthread_exit (NULL);
+}
+
+void *
+dso_process (void * p)
+{
+ void *handle[DSO_SHARED_FILES];
+ function fun_vec[DSO_SHARED_FILES];
+ char dso_path[DSO_SHARED_FILES][100];
+ int dso;
+ uintptr_t t = (uintptr_t) p;
+
+ // Open dso's and get a function.
+ for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+ {
+ sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso);
+
+ pthread_mutex_lock (&g_lock);
+
+ handle[dso] = dlopen (dso_path[dso], RTLD_NOW);
+ assert (handle[dso]);
+
+ fun_vec[dso] = (function) dlsym (handle[dso], "function");
+ assert (fun_vec[dso]);
+
+ pthread_mutex_unlock (&g_lock);
+ }
+
+ // Spawn workers
+ pthread_t thread[DSO_EXEC_THREADS];
+ int i, ret;
+ uintptr_t result = 0;
+ for (i = 0; i < DSO_EXEC_THREADS; i++)
+ {
+ pthread_mutex_lock (&g_lock);
+ ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec);
+ if (ret != 0)
+ {
+ printf ("pthread_create failed: %d\n", ret);
+ result = 1;
+ }
+ pthread_mutex_unlock (&g_lock);
+ }
+
+ if (!result)
+ for (i = 0; i < DSO_EXEC_THREADS; i++)
+ {
+ ret = pthread_join (thread[i], NULL);
+ if (ret != 0)
+ {
+ printf ("pthread_join failed: %d\n", ret);
+ result = 1;
+ }
+ }
+
+ // Close all dso's
+ for (dso = 0; dso < DSO_SHARED_FILES; dso++)
+ {
+ pthread_mutex_lock (&g_lock);
+ dlclose (handle[dso]);
+ pthread_mutex_unlock (&g_lock);
+ }
+
+ // Exit
+ pthread_exit ((void *) result);
+}
+
+static int
+do_test (void)
+{
+ pthread_t thread[DSO_OPEN_THREADS];
+ int i,j;
+ int ret;
+ int result = 0;
+
+ pthread_mutex_init (&g_lock, NULL);
+
+ for (j = 0; j < 100; j++)
+ {
+ for (i = 0; i < DSO_OPEN_THREADS; i++)
+ {
+ ret = pthread_create (&thread[i], NULL, dso_process,
+ (void *) (uintptr_t) i);
+ if (ret != 0)
+ {
+ printf ("pthread_create failed: %d\n", ret);
+ result = 1;
+ }
+ }
+
+ if (result)
+ break;
+
+ for (i = 0; i < DSO_OPEN_THREADS; i++)
+ {
+ ret = pthread_join (thread[i], NULL);
+ if (ret != 0)
+ {
+ printf ("pthread_join failed: %d\n", ret);
+ result = 1;
+ }
+ }
+ }
+
+ return result;
+}
+
+#define TEST_FUNCTION do_test ()
+#define TIMEOUT 100
+#include "../test-skeleton.c"
diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
new file mode 100644
index 0000000..4dd8079
--- /dev/null
+++ b/nptl/tst-stack4mod.c
@@ -0,0 +1,11 @@
+// This tests DTV usage with TLS in dlopened shared object.
+
+__thread int var[256] attribute_hidden = {0};
+
+void
+function (void)
+{
+ int i;
+ for (i = 0; i < 256; i++)
+ var[i] = i;
+}
--
1.9.3