This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]
Other format: [Raw text]

Re: [PATCH] Memory fencing problem in pthread cancellation


On 01/14/2013 03:24 PM, Jeff Law wrote:
> Let's consider these two hunks of code in unwind-forcedunwind.c

I'm adding libc-ports, Joseph Myers, and Marcus Shawcroft
since this involves arm ARM devices that have a weakly ordered
memory model. I'm also adding Mike Frysinger since this involes
IA64.

See my review below.

> void
> __attribute_noinline__
> pthread_cancel_init (void)
> {
>   void *resume;
>   void *personality;
>   void *forcedunwind;
>   void *getcfa;
>   void *handle;
> 
>   if (__builtin_expect (libgcc_s_handle != NULL, 1))
>     {
>       /* Force gcc to reload all values.  */
>       asm volatile ("" ::: "memory");
>       return;
>     }
>   [ ... ]
> 
>   PTR_MANGLE (resume);
>   libgcc_s_resume = resume;
>   PTR_MANGLE (personality);
>   libgcc_s_personality = personality;
>   PTR_MANGLE (forcedunwind);
>   libgcc_s_forcedunwind = forcedunwind;
>   PTR_MANGLE (getcfa);
>   libgcc_s_getcfa = getcfa;
>   /* Make sure libgcc_s_handle is written last.  Otherwise,
>      pthread_cancel_init might return early even when the pointer the
>      caller is interested in is not initialized yet.  */
>   atomic_write_barrier ();
>   libgcc_s_handle = handle;
> }
> 
> void
> _Unwind_Resume (struct _Unwind_Exception *exc)
> {
>   if (__builtin_expect (libgcc_s_handle == NULL, 0))
>     pthread_cancel_init ();
>   else
>     atomic_read_barrier ();
> 
>   void (*resume) (struct _Unwind_Exception *exc) = libgcc_s_resume;
>   PTR_DEMANGLE (resume);
>   resume (exc);
> }
> 
> 
> If we're in Unwind_Resume and libgcc_s_handle is null, we call
> pthread_cancel_init. pthread_cancel_init first checks if
> libgcc_s_handle is non-null. If so, then pthread_cancel_init exits
> early. Upon returning to Unwind_Resume we'll proceed to load &
> demangle libgcc_s_resume and call it.
> 
> Note carefully in this case we never executed an atomic_read_barrier
> between the load of libgcc_s_handle and libgcc_s_resume loads. On a
> weakly ordered target such as power7, the processor might speculate
> the libgcc_s_resume load to a points prior to loading
> libgcc_s_handle.
> 
> So if another thread happens to update libgcc_s_handle between the
> loads & checks of libgcc_s_handle in Unwind_Resume and
> pthread_cancel_init, then we can end up using stale data for
> libgcc_s_resume and blow up.
> 
> This has been observed on a 32 processor power7 machine running 16
> instances of the attached testcase in parallel after a period of
> several hours.
> 
> There's a couple ways to fix this. One could be to eliminate the
> early return from pthread_cancel_init. Unfortunately there's a call
> to pthread_cancel_init from pthread_cancel. So every one of those
> calls would suffer a performance penalty unless libgcc_s_handle as
> exported from unwind-forcedunwind.c
> 
> Another is to add a call to atomic_read_barrier just before the early
> return from pthread_cancel_init. That's precisely what this patch
> does.
> 
> 
> While investigating, Carlos identified that the IA64 and ARM ports
> have their own unwind-forcedunwind.c implementations and that they
> were buggy in regards to fencing as well. Both fail to test
> libgcc_s_handle and do not have the necessary calls to
> atomic_read_barrier. This patch fixes those issues as well.
> 
> While I have extensively tested the generic unwind-forcedunwind.c
> change backported to a glibc-2.12 base, I have not checked the ARM or
> IA64 bits in any way.> 
> 
> 
> patch
> 
> 
> diff --git a/nptl/ChangeLog b/nptl/ChangeLog
> index 4aacc17..90d9408 100644
> --- a/nptl/ChangeLog
> +++ b/nptl/ChangeLog
> @@ -1,3 +1,8 @@
> +2013-01-14  Jeff Law  <law@redhat.com>
> +
> +	* sysdeps/pthrad/unwind-forcedunwind.c (pthread_cancel_init): Add
> +	missing call to atomic_read_barrier in early return path.
> +
>  2013-01-11  Carlos O'Donell  <codonell@redhat.com>
>  
>  	* allocatestack.c (allocate_stack): Add comment. Remove assert
> diff --git a/nptl/sysdeps/pthread/unwind-forcedunwind.c b/nptl/sysdeps/pthread/unwind-forcedunwind.c
> index 9718606..6b72e3e 100644
> --- a/nptl/sysdeps/pthread/unwind-forcedunwind.c
> +++ b/nptl/sysdeps/pthread/unwind-forcedunwind.c
> @@ -46,6 +46,10 @@ pthread_cancel_init (void)
>      {
>        /* Force gcc to reload all values.  */
>        asm volatile ("" ::: "memory");
> +      /* Order reads so as to prevent speculation of loads
> +	 of libgcc_s_{resume,personality,forcedunwind,getcfa}
> +	 to points prior to loading libgcc_s_handle.  */

I'd say:

s/loading libgcc_s_handle/the write barrier./g

It is the writes of the other thread to globals 
libgcc_s_{resume,personality,forcedunwind,getcfa}
which have not yet been seen by this thread which
we are synchronizing using the read barrier.

> +      atomic_read_barrier ();
>        return;
>      }
>  
> diff --git a/ports/ChangeLog.arm b/ports/ChangeLog.arm
> index d44ea76..cdd2210 100644
> --- a/ports/ChangeLog.arm
> +++ b/ports/ChangeLog.arm
> @@ -1,3 +1,13 @@
> +2013-01-14  Jeff Law  <law@redhat.com>
> +
> +	* sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> +	(pthread_cancel_init):  Add missing call to atomic_read_barrier
> +	in early return path.
> +	(__gcc_personality_v0): Guard call to pthread_cancel_init with
> +	a test of libgcc_s_handle, not libgcc_s_personality.  Add
> +	missing call to atomic_read_barrier.
> +	(_Unwind_ForcedUnwind, _Unwind_GetCFA): Similarly.
> +
>  2013-01-02  Joseph Myers  <joseph@codesourcery.com>
>  
>  	* All files with FSF copyright notices: Update copyright dates
> diff --git a/ports/ChangeLog.ia64 b/ports/ChangeLog.ia64
> index 28d5076..b6e1b9c 100644
> --- a/ports/ChangeLog.ia64
> +++ b/ports/ChangeLog.ia64
> @@ -1,3 +1,9 @@
> +2013-01-14  Jeff Law  <law@redhat.com>
> +
> +        * sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> +        (_Unwind_GetBSP): Guard call to pthread_cancel_init with
> +        a test of libgcc_s_handle, not libgcc_s_getbsp.
> +
>  2013-01-02  Joseph Myers  <joseph@codesourcery.com>
>  
>  	* All files with FSF copyright notices: Update copyright dates
> diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> index 58ca9ac..1a081c8 100644
> --- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> +++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c

Please also fix this incorrect comment:

 63   libgcc_s_getcfa = getcfa;
 64   /* Make sure libgcc_s_getcfa is written last.  Otherwise,
 65      pthread_cancel_init might return early even when the pointer the
 66      caller is interested in is not initialized yet.  */
 67   atomic_write_barrier ();
 68   libgcc_s_handle = handle;
 69 }

It should talk about libcc_s_handle not libgcc_s_getcfa.


> @@ -40,6 +40,10 @@ pthread_cancel_init (void)
>      {
>        /* Force gcc to reload all values.  */
>        asm volatile ("" ::: "memory");
> +     /* Order reads so as to prevent speculation of loads
> +       of libgcc_s_{resume,personality,forcedunwind,getcfa}
> +       to points prior to loading libgcc_s_handle.  */
> +      atomic_read_barrier ();

Same comment as above.

>        return;
>      }
>  
> @@ -132,8 +136,10 @@ __gcc_personality_v0 (_Unwind_State state,
>  		      struct _Unwind_Exception *ue_header,
>  		      struct _Unwind_Context *context)
>  {
> -  if (__builtin_expect (libgcc_s_personality == NULL, 0))
> +  if (__builtin_expect (libgcc_s_handle == NULL, 0))
>      pthread_cancel_init ();
> +  else
> +    atomic_read_barrier ();
>  
>    return libgcc_s_personality (state, ue_header, context);
>  }
> @@ -142,8 +148,10 @@ _Unwind_Reason_Code
>  _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>  		      void *stop_argument)
>  {
> -  if (__builtin_expect (libgcc_s_forcedunwind == NULL, 0))
> +  if (__builtin_expect (libgcc_s_handle == NULL, 0))
>      pthread_cancel_init ();
> +  else
> +    atomic_read_barrier ();
>  
>    return libgcc_s_forcedunwind (exc, stop, stop_argument);
>  }
> @@ -151,8 +159,10 @@ _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
>  _Unwind_Word
>  _Unwind_GetCFA (struct _Unwind_Context *context)
>  {
> -  if (__builtin_expect (libgcc_s_getcfa == NULL, 0))
> +  if (__builtin_expect (libgcc_s_handle == NULL, 0))
>      pthread_cancel_init ();
> +  else
> +    atomic_read_barrier ();
>  
>    return libgcc_s_getcfa (context);
>  }

Not sufficient.

You need to fix the assembly also.

 83 /* It's vitally important that _Unwind_Resume not have a stack frame; the
 84    ARM unwinder relies on register state at entrance.  So we write this in
 85    assembly.  */
 86 
 87 asm (
 88 "       .globl  _Unwind_Resume\n"
 89 "       .type   _Unwind_Resume, %function\n"
 90 "_Unwind_Resume:\n"
 91 "       .cfi_sections .debug_frame\n"
 92 "       " CFI_STARTPROC "\n"
 93 "       stmfd   sp!, {r4, r5, r6, lr}\n"
 94 "       " CFI_ADJUST_CFA_OFFSET (16)" \n"
 95 "       " CFI_REL_OFFSET (r4, 0) "\n"
 96 "       " CFI_REL_OFFSET (r5, 4) "\n"
 97 "       " CFI_REL_OFFSET (r6, 8) "\n"
 98 "       " CFI_REL_OFFSET (lr, 12) "\n"
 99 "       " CFI_REMEMBER_STATE "\n"
100 "       ldr     r4, 1f\n"
101 "       ldr     r5, 2f\n"
102 "3:     add     r4, pc, r4\n"
103 "       ldr     r3, [r4, r5]\n"
104 "       mov     r6, r0\n"
105 "       cmp     r3, #0\n"
106 "       beq     4f\n"

The above assembly is equivalent to:

if (libgcc_s_resume == NULL)
  pthread_cancel_init ();

Which is missing the atomic_read_barrier ().

107 "5:     mov     r0, r6\n"
108 "       ldmfd   sp!, {r4, r5, r6, lr}\n"
109 "       " CFI_ADJUST_CFA_OFFSET (-16) "\n"
110 "       " CFI_RESTORE (r4) "\n"
111 "       " CFI_RESTORE (r5) "\n"
112 "       " CFI_RESTORE (r6) "\n"
113 "       " CFI_RESTORE (lr) "\n"
114 "       bx      r3\n"
115 "       " CFI_RESTORE_STATE "\n"
116 "4:     bl      pthread_cancel_init\n"
117 "       ldr     r3, [r4, r5]\n"
118 "       b       5b\n"
119 "       " CFI_ENDPROC "\n"
120 "       .align 2\n"
121 #ifdef __thumb2__
122 "1:     .word   _GLOBAL_OFFSET_TABLE_ - 3b - 4\n"
123 #else
124 "1:     .word   _GLOBAL_OFFSET_TABLE_ - 3b - 8\n"
125 #endif
126 "2:     .word   libgcc_s_resume(GOTOFF)\n"
127 "       .size   _Unwind_Resume, .-_Unwind_Resume\n"
128 );

Thus you need something like this completely untested patch:

diff --git a/ports/sysdeps/arm/bits/atomic.h b/ports/sysdeps/arm/bits/atomic.h
index 6e20df7..b0f45ad 100644
--- a/ports/sysdeps/arm/bits/atomic.h
+++ b/ports/sysdeps/arm/bits/atomic.h
@@ -42,6 +42,7 @@ void __arm_link_error (void);
 # define atomic_full_barrier() __sync_synchronize ()
 #else
 # define atomic_full_barrier() __arm_assisted_full_barrier ()
+# define atomic_asm_full_barrier() __arm_asm_assisted_full_barrier ()
 #endif
 
 /* An OS-specific bits/atomic.h file will define this macro if
diff --git a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
index c76b8f3..726d3ce 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
@@ -17,7 +17,8 @@
    <http://www.gnu.org/licenses/>.  */
 
 /* If the compiler doesn't provide a primitive, we'll use this macro
-   to get assistance from the kernel.  */
+   to get assistance from the kernel. This should call the
+   __kuser_memory_barrier helper in the kernel.  */
 #ifdef __thumb2__
 # define __arm_assisted_full_barrier() \
      __asm__ __volatile__                                                    \
@@ -25,6 +26,11 @@
              "movt\tip, #0xffff\n\t"                                         \
              "blx\tip"                                                       \
              : : : "ip", "lr", "cc", "memory");
+/* The asm variant is used as an insert into existing asm statements.  */
+# define __arm_asm_assisted_full_barrier() \
+       "       movw    ip, #0x0fa0\n" \
+       "       movt    ip, #0xffff\n" \
+       "       blx     ip"
 #else
 # define __arm_assisted_full_barrier() \
      __asm__ __volatile__                                                    \
@@ -32,6 +38,11 @@
              "mov\tlr, pc\n\t"                                               \
              "add\tpc, ip, #(0xffff0fa0 - 0xffff0fff)"                       \
              : : : "ip", "lr", "cc", "memory");
+/* The asm variant is used as an insert into existing asm statements.  */
+# define __arm_asm_assisted_full_barrier() \
+       "       mov     ip, #0xffff0fff\n" \
+       "       mov     lr, pc\n" \
+       "       add     pc, ip, #(0xffff0fa0 - 0xffff0fff)"
 #endif
 
 /* Atomic compare and exchange.  This sequence relies on the kernel to
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
index 58ca9ac..d7ac847 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
@@ -104,6 +104,7 @@ asm (
 "      mov     r6, r0\n"
 "      cmp     r3, #0\n"
 "      beq     4f\n"
+       atomic_asm_full_barrier()
 "5:    mov     r0, r6\n"
 "      ldmfd   sp!, {r4, r5, r6, lr}\n"
 "      " CFI_ADJUST_CFA_OFFSET (-16) "\n"
@@ -123,7 +124,7 @@ asm (
 #else
 "1:    .word   _GLOBAL_OFFSET_TABLE_ - 3b - 8\n"
 #endif
-"2:    .word   libgcc_s_resume(GOTOFF)\n"
+"2:    .word   libgcc_s_handle(GOTOFF)\n"
 "      .size   _Unwind_Resume, .-_Unwind_Resume\n"
 );
 
---

Note: 
* As of ARMv6 we should really have used: `MCR p15, 0, r0, c7, c10, 5'
  instead of calling the kernel helper.
* As of ARMv7 we should really have used: `DMB' instead of calling the
  kernel barrier.
* As of ARMv7 we should be using DMB with specific completers 
  to implement atomic_read_barrier() and atomic_write_barrier().

> diff --git a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> index 8562afd..1d197e1 100644
> --- a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> +++ b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> @@ -31,8 +31,10 @@ static _Unwind_Word (*libgcc_s_getbsp) (struct _Unwind_Context *);
>  _Unwind_Word
>  _Unwind_GetBSP (struct _Unwind_Context *context)
>  {
> -  if (__builtin_expect (libgcc_s_getbsp == NULL, 0))
> +  if (__builtin_expect (libgcc_s_handle == NULL, 0))
>      pthread_cancel_init ();
> +  else
> +    atomic_read_barrier ();
>  
>    return libgcc_s_getbsp (context);
>  }

The IA64 fix looks OK to me.

Cheers,
Carlos.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav: [Date Prev] [Date Next] [Thread Prev] [Thread Next]