This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH][BZ #13862] Reuse of cached stack can cause bounds overrun of thread DTV


On 11/27/2014 09:06 AM, H.J. Lu wrote:
> Here is the promised patch.  OK for trunk?

Not yet.

My personal opinion is that there has been insufficient detailed
analysis posted for this patch. For this kind of change I want to
see the author understand the code paths in question and how the
call flow gets to the position of failure. Paul has given a hand-waving
example of the failure without much detail. I would like to see
sufficient detail like:
---
When calling pthread_create we call allocate_stack which then reuses
a stack by finding one via get_cached_stack. In get_cached_stack we
select an appropriately sized stack and in that process get the old
dtv from the thread, which could be reused if it was the right size.
Then _dl_allocate_tls_init is called to reinitialize the TLS structures
for the thread. In _dl_allocate_tls_init the expectation that is violated
is that dtv should have been calloc'd to a size of GL(dl_tls_max_dtv_idx) 
+ DTV_SURPLUS; by allocate_dtv. While this is true, it was done with an
old value of dl_tls_max_dtv_idx and that value might have changed since
the stack was last released.
---
This is just an example, you would have to verify that this is actually
the case.

Q: A dlclose() may modify GL(dl_tls_max_dtv_idx) concurrently with another
   thread starting up. Is it safe to read GL(dl_tls_max_dtv_idx) more than
   once in a function since they may differ? Does reading GL(dl_tls_max_dtv_idx)
   require an atomic load? We are trying to be explicit about this in glibc.

Q: If GL(dl_tls_max_dtv_idx) may change at any time during thread startup
   does it make sense to reallocate the dtv? Why not leave it uninitialized
   and allow _dl_update_slotinfo to reallocate it?

Q: If you don't want to delay the dtv reallocation to _dl_update_slotinfo,
   why don't you refactor the reallocation of the dtv into a single function
   to avoid duplication with the same code in _dl_update_slotinfo?

> ----
> From b811f571b5604f84c84d5b96394e6b892e05d14f Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" <hjl.tools@gmail.com>
> Date: Thu, 27 Nov 2014 05:42:23 -0800
> Subject: [PATCH] Reallocate DTV if the current DTV isn't big enough
> 
> This patch reallocates DTV if the current DTV isn't big enough, using
> the similar approach in _dl_update_slotinfo.  Tested on X86-64, x32
> and ia32.
> 
> 	[BZ #13862]
> 	* elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
> 	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
> 	(_dl_allocate_tls_init): This.  Reallocate DTV if the current
> 	DTV isn't big enough.
> 	* nptl/Makefile (tests): Add tst-stack4.
> 	(modules-names): Add tst-stack4mod.
> 	($(objpfx)tst-stack4): New.
> 	(tst-stack4mod.sos): Likewise.
> 	($(objpfx)tst-stack4.out): Likewise.
> 	($(tst-stack4mod.sos)): Likewise.
> 	(clean): Likewise.
> 	* nptl/tst-stack4.c: New file.
> 	* nptl/tst-stack4mod.c: Likewise.
> ---
>  ChangeLog            |  17 +++++++
>  elf/dl-tls.c         |  53 ++++++++++++++++++---
>  nptl/Makefile        |  17 ++++++-
>  nptl/tst-stack4.c    | 130 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  nptl/tst-stack4mod.c |   9 ++++
>  5 files changed, 217 insertions(+), 9 deletions(-)
>  create mode 100644 nptl/tst-stack4.c
>  create mode 100644 nptl/tst-stack4mod.c
> 
> diff --git a/ChangeLog b/ChangeLog
> index 92ffe16..e6474be 100644
> --- a/ChangeLog
> +++ b/ChangeLog
> @@ -1,3 +1,20 @@
> +2014-11-27  H.J. Lu  <hongjiu.lu@intel.com>
> +
> +	[BZ #13862]
> +	* elf/dl-tls.c (oom): Remove #ifdef SHARED/#endif.
> +	(_dl_static_dtv, _dl_initial_dtv): Moved before ...
> +	(_dl_allocate_tls_init): This.  Reallocate DTV if the current
> +	DTV isn't big enough.
> +	* nptl/Makefile (tests): Add tst-stack4.
> +	(modules-names): Add tst-stack4mod.
> +	($(objpfx)tst-stack4): New.
> +	(tst-stack4mod.sos): Likewise.
> +	($(objpfx)tst-stack4.out): Likewise.
> +	($(tst-stack4mod.sos)): Likewise.
> +	(clean): Likewise.
> +	* nptl/tst-stack4.c: New file.
> +	* nptl/tst-stack4mod.c: Likewise.
> +
>  2014-11-27  Stefan Liebler  <stli@linux.vnet.ibm.com>
>  
>  	* nscd/connections.c: Include libc-internal.h because of macro
> diff --git a/elf/dl-tls.c b/elf/dl-tls.c
> index 5204fda..3142b16 100644
> --- a/elf/dl-tls.c
> +++ b/elf/dl-tls.c
> @@ -34,14 +34,12 @@
>  
>  
>  /* Out-of-memory handler.  */
> -#ifdef SHARED
>  static void
>  __attribute__ ((__noreturn__))
>  oom (void)
>  {
>    _dl_fatal_printf ("cannot allocate memory for thread-local data: ABORT\n");
>  }
> -#endif

OK.

>  
>  size_t
> @@ -397,6 +395,11 @@ _dl_allocate_tls_storage (void)
>  }
>  
>  
> +#ifndef SHARED
> +extern dtv_t _dl_static_dtv[];
> +# define _dl_initial_dtv (&_dl_static_dtv[1])
> +#endif
> +

OK.

>  void *
>  internal_function
>  _dl_allocate_tls_init (void *result)
> @@ -410,6 +413,47 @@ _dl_allocate_tls_init (void *result)
>    size_t total = 0;
>    size_t maxgen = 0;
>  
> +  /* Check if the current dtv is big enough.   */
> +  if (dtv[-1].counter < GL(dl_tls_max_dtv_idx))
> +    {
> +      /* Reallocate the dtv.  */
> +      dtv_t *newp;
> +      size_t newsize = GL(dl_tls_max_dtv_idx) + DTV_SURPLUS;

IMO, access to GL(dl_tls_max_dtv_idx) needs to be annotated as
an atomic load until someone does the analysis to relax that.
I'm not commenting on the other users in the function that you
didn't touch.

> +      size_t oldsize = dtv[-1].counter;
> +
> +      if (dtv == GL(dl_initial_dtv))
> +	{
> +	  /* This is the initial dtv that was allocated
> +	     during rtld startup using the dl-minimal.c
> +	     malloc instead of the real malloc.  We can't
> +	     free it, we have to abandon the old storage.  */
> +
> +	  newp = malloc ((2 + newsize) * sizeof (dtv_t));
> +	  if (newp == NULL)
> +	    oom ();
> +	  memcpy (newp, &dtv[-1], (2 + oldsize) * sizeof (dtv_t));
> +	}
> +      else
> +	{
> +	  newp = realloc (&dtv[-1],
> +			  (2 + newsize) * sizeof (dtv_t));
> +	  if (newp == NULL)
> +	    oom ();
> +	}
> +
> +      newp[0].counter = newsize;
> +
> +      /* Clear the newly allocated part.  */
> +      memset (newp + 2 + oldsize, '\0',
> +	      (newsize - oldsize) * sizeof (dtv_t));
> +
> +      /* Install this new dtv in the thread data structures.  */
> +      INSTALL_DTV (result, newp);
> +
> +      /* Point dtv to the generation counter.  */
> +      dtv = &newp[1];
> +    }
> +

The code itself is OK, but it's a copy of what _dl_update_slotinfo
does. Please refactor into a static inline function e.g. resize_dtv.

>    /* We have to prepare the dtv for all currently loaded modules using
>       TLS.  For those which are dynamically loaded we add the values
>       indicating deferred allocation.  */
> @@ -492,11 +536,6 @@ _dl_allocate_tls (void *mem)
>  rtld_hidden_def (_dl_allocate_tls)
>  
>  
> -#ifndef SHARED
> -extern dtv_t _dl_static_dtv[];
> -# define _dl_initial_dtv (&_dl_static_dtv[1])
> -#endif
> -
>  void
>  internal_function
>  _dl_deallocate_tls (void *tcb, bool dealloc_tcb)
> diff --git a/nptl/Makefile b/nptl/Makefile
> index 86a1b0c..ac76596 100644
> --- a/nptl/Makefile
> +++ b/nptl/Makefile
> @@ -254,7 +254,7 @@ tests = tst-typesizes \
>  	tst-exec1 tst-exec2 tst-exec3 tst-exec4 \
>  	tst-exit1 tst-exit2 tst-exit3 \
>  	tst-stdio1 tst-stdio2 \
> -	tst-stack1 tst-stack2 tst-stack3 tst-pthread-getattr \
> +	tst-stack1 tst-stack2 tst-stack3 tst-stack4 tst-pthread-getattr \

OK.

>  	tst-pthread-attr-affinity \
>  	tst-unload \
>  	tst-dlsym1 \
> @@ -311,7 +311,7 @@ endif
>  
>  modules-names = tst-atfork2mod tst-tls3mod tst-tls4moda tst-tls4modb \
>  		tst-tls5mod tst-tls5moda tst-tls5modb tst-tls5modc \
> -		tst-tls5modd tst-tls5mode tst-tls5modf \
> +		tst-tls5modd tst-tls5mode tst-tls5modf tst-stack4mod \

OK.

>  		tst-_res1mod1 tst-_res1mod2 tst-execstack-mod tst-fini1mod
>  extra-test-objs += $(addsuffix .os,$(strip $(modules-names))) tst-cleanup4aux.o
>  test-extras += $(modules-names) tst-cleanup4aux
> @@ -479,6 +479,19 @@ $(objpfx)tst-stack3-mem.out: $(objpfx)tst-stack3.out
>  	$(evaluate-test)
>  generated += tst-stack3-mem.out tst-stack3.mtrace
>  
> +$(objpfx)tst-stack4: $(libdl) $(shared-thread-library)
> +tst-stack4mod.sos=$(shell for i in 0 1 2 3 4 5 6 7 8 9 10 \
> +				   11 12 13 14 15 16 17 18 19; do \
> +			    for j in 0 1 2 3 4 5 6 7 8 9 10 \
> +				     11 12 13 14 15 16 17 18 19; do \
> +			      echo $(objpfx)tst-stack4mod-$$i-$$j.so; \
> +			    done; done)
> +$(objpfx)tst-stack4.out: $(tst-stack4mod.sos)
> +$(tst-stack4mod.sos): $(objpfx)tst-stack4mod.so
> +	cp -f $< $@
> +clean:
> +	rm -f $(tst-stack4mod.sos)
> +

OK.

>  $(objpfx)tst-cleanup4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>  $(objpfx)tst-cleanupx4: $(objpfx)tst-cleanup4aux.o $(shared-thread-library)
>  
> diff --git a/nptl/tst-stack4.c b/nptl/tst-stack4.c
> new file mode 100644
> index 0000000..0a0d2f9
> --- /dev/null
> +++ b/nptl/tst-stack4.c
> @@ -0,0 +1,130 @@

Missing first line explaining test.

Missing license.

> +#include <stdio.h>
> +#include <stdint.h>
> +#include <dlfcn.h>
> +#include <assert.h>
> +#include <pthread.h>
> +
> +#define DSO_SHARED_FILES 20
> +#define DSO_OPEN_THREADS 20
> +#define DSO_EXEC_THREADS 2

Why these particular constants? If you chose them because
they happened to work on a particular system, please comment
that.

> +
> +pthread_mutex_t g_lock;
> +

Describe what the mutex protects in a comment.

> +typedef void (*function) (void);
> +
> +void *
> +dso_invoke(void *dso_fun)
> +{
> +  function *fun_vec = (function *) dso_fun;
> +  int dso;
> +
> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    (*fun_vec[dso]) ();
> +
> +  pthread_exit (NULL);
> +}
> +
> +void *
> +dso_process (void * p)
> +{
> +  void *handle[DSO_SHARED_FILES];
> +  function fun_vec[DSO_SHARED_FILES];
> +  char dso_path[DSO_SHARED_FILES][100];
> +  int dso;
> +  uintptr_t t = (uintptr_t) p;
> +
> +  // Open dso's and get a function.

Use /* */.

> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    {
> +      sprintf (dso_path[dso], "tst-stack4mod-%i-%i.so", t, dso);
> +
> +      pthread_mutex_lock (&g_lock);
> +
> +      handle[dso] = dlopen (dso_path[dso], RTLD_NOW);
> +      assert (handle[dso]);
> +
> +      fun_vec[dso] = (function) dlsym (handle[dso], "function");
> +      assert (fun_vec[dso]);
> +
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  // Spawn workers

Use /* */

> +  pthread_t thread[DSO_EXEC_THREADS];
> +  int i, ret;
> +  uintptr_t result = 0;
> +  for (i = 0; i < DSO_EXEC_THREADS; i++)
> +    {
> +      pthread_mutex_lock (&g_lock);
> +      ret = pthread_create (&thread[i], NULL, dso_invoke, (void *) fun_vec);
> +      if (ret != 0)
> +	{
> +	  printf ("pthread_create failed: %d\n", ret);
> +	  result = 1;
> +	}
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  if (!result) 
> +    for (i = 0; i < DSO_EXEC_THREADS; i++)
> +      {
> +	ret = pthread_join (thread[i], NULL);
> +	if (ret != 0)
> +	  {
> +	    printf ("pthread_join failed: %d\n", ret);
> +	    result = 1;
> +	  }
> +      }
> +
> +  // Close all dso's

Use /* */

> +  for (dso = 0; dso < DSO_SHARED_FILES; dso++)
> +    {
> +      pthread_mutex_lock (&g_lock);
> +      dlclose (handle[dso]);
> +      pthread_mutex_unlock (&g_lock);
> +    }
> +
> +  // Exit
> +  pthread_exit ((void *) result);
> +}
> +
> +

This test needs a verbose, minimum one paragraph, comment explaining 
what the test is testing.

> +int
> +main()
> +{
> +  pthread_t thread[DSO_OPEN_THREADS];
> +  int i,j;
> +  int ret;
> +  int result = 0;
> +
> +  pthread_mutex_init (&g_lock, NULL);
> +
> +  for (j = 0; j < 100; j++)

Why 100 times? Please comment.

> +    {
> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
> +	{
> +	  ret = pthread_create (&thread[i], NULL, dso_process,
> +				(void *) (uintptr_t) i);
> +	  if (ret != 0)
> +	    {
> +	      printf ("pthread_create failed: %d\n", ret);
> +	      result = 1;
> +	    }
> +	}
> +
> +      if (result)
> +	break;
> +
> +      for (i = 0; i < DSO_OPEN_THREADS; i++)
> +	{
> +	  ret = pthread_join (thread[i], NULL);
> +	  if (ret != 0)
> +	    {
> +	      printf ("pthread_join failed: %d\n", ret);
> +	      result = 1;
> +	    }
> +	}
> +    }
> +
> +  return result;
> +}
> diff --git a/nptl/tst-stack4mod.c b/nptl/tst-stack4mod.c
> new file mode 100644
> index 0000000..b8e800f
> --- /dev/null
> +++ b/nptl/tst-stack4mod.c
> @@ -0,0 +1,9 @@

Please give a one line description at the top of this file.

> +__thread int var[256] __attribute__ ((visibility ("hidden"))) = {0};
> +
> +void
> +function (void)
> +{
> +  int i;
> +  for (i = 0; i < 256; i++)

Why 256 times? Please comment.

> +    var[i] = i;
> +}
> 

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]