This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] Memory fencing problem in pthread cancellation
- From: Carlos O'Donell <carlos at systemhalted dot org>
- To: Jeff Law <law at redhat dot com>, "Joseph S. Myers" <joseph at codesourcery dot com>,Mike Frysinger <vapier at gentoo dot org>
- Cc: libc-alpha <libc-alpha at sourceware dot org>, libc-ports at sourceware dot org, "Marcus Shawcroft" <marcus dot shawcroft at linaro dot org>
- Date: Mon, 14 Jan 2013 17:11:13 -0500
- Subject: Re: [PATCH] Memory fencing problem in pthread cancellation
- References: <50F46969.1000305@redhat.com>
On 01/14/2013 03:24 PM, Jeff Law wrote:
> Let's consider these two hunks of code in unwind-forcedunwind.c
I'm adding libc-ports, Joseph Myers, and Marcus Shawcroft
since this involves arm ARM devices that have a weakly ordered
memory model. I'm also adding Mike Frysinger since this involes
IA64.
See my review below.
> void
> __attribute_noinline__
> pthread_cancel_init (void)
> {
> void *resume;
> void *personality;
> void *forcedunwind;
> void *getcfa;
> void *handle;
>
> if (__builtin_expect (libgcc_s_handle != NULL, 1))
> {
> /* Force gcc to reload all values. */
> asm volatile ("" ::: "memory");
> return;
> }
> [ ... ]
>
> PTR_MANGLE (resume);
> libgcc_s_resume = resume;
> PTR_MANGLE (personality);
> libgcc_s_personality = personality;
> PTR_MANGLE (forcedunwind);
> libgcc_s_forcedunwind = forcedunwind;
> PTR_MANGLE (getcfa);
> libgcc_s_getcfa = getcfa;
> /* Make sure libgcc_s_handle is written last. Otherwise,
> pthread_cancel_init might return early even when the pointer the
> caller is interested in is not initialized yet. */
> atomic_write_barrier ();
> libgcc_s_handle = handle;
> }
>
> void
> _Unwind_Resume (struct _Unwind_Exception *exc)
> {
> if (__builtin_expect (libgcc_s_handle == NULL, 0))
> pthread_cancel_init ();
> else
> atomic_read_barrier ();
>
> void (*resume) (struct _Unwind_Exception *exc) = libgcc_s_resume;
> PTR_DEMANGLE (resume);
> resume (exc);
> }
>
>
> If we're in Unwind_Resume and libgcc_s_handle is null, we call
> pthread_cancel_init. pthread_cancel_init first checks if
> libgcc_s_handle is non-null. If so, then pthread_cancel_init exits
> early. Upon returning to Unwind_Resume we'll proceed to load &
> demangle libgcc_s_resume and call it.
>
> Note carefully in this case we never executed an atomic_read_barrier
> between the load of libgcc_s_handle and libgcc_s_resume loads. On a
> weakly ordered target such as power7, the processor might speculate
> the libgcc_s_resume load to a points prior to loading
> libgcc_s_handle.
>
> So if another thread happens to update libgcc_s_handle between the
> loads & checks of libgcc_s_handle in Unwind_Resume and
> pthread_cancel_init, then we can end up using stale data for
> libgcc_s_resume and blow up.
>
> This has been observed on a 32 processor power7 machine running 16
> instances of the attached testcase in parallel after a period of
> several hours.
>
> There's a couple ways to fix this. One could be to eliminate the
> early return from pthread_cancel_init. Unfortunately there's a call
> to pthread_cancel_init from pthread_cancel. So every one of those
> calls would suffer a performance penalty unless libgcc_s_handle as
> exported from unwind-forcedunwind.c
>
> Another is to add a call to atomic_read_barrier just before the early
> return from pthread_cancel_init. That's precisely what this patch
> does.
>
>
> While investigating, Carlos identified that the IA64 and ARM ports
> have their own unwind-forcedunwind.c implementations and that they
> were buggy in regards to fencing as well. Both fail to test
> libgcc_s_handle and do not have the necessary calls to
> atomic_read_barrier. This patch fixes those issues as well.
>
> While I have extensively tested the generic unwind-forcedunwind.c
> change backported to a glibc-2.12 base, I have not checked the ARM or
> IA64 bits in any way.>
>
>
> patch
>
>
> diff --git a/nptl/ChangeLog b/nptl/ChangeLog
> index 4aacc17..90d9408 100644
> --- a/nptl/ChangeLog
> +++ b/nptl/ChangeLog
> @@ -1,3 +1,8 @@
> +2013-01-14 Jeff Law <law@redhat.com>
> +
> + * sysdeps/pthrad/unwind-forcedunwind.c (pthread_cancel_init): Add
> + missing call to atomic_read_barrier in early return path.
> +
> 2013-01-11 Carlos O'Donell <codonell@redhat.com>
>
> * allocatestack.c (allocate_stack): Add comment. Remove assert
> diff --git a/nptl/sysdeps/pthread/unwind-forcedunwind.c b/nptl/sysdeps/pthread/unwind-forcedunwind.c
> index 9718606..6b72e3e 100644
> --- a/nptl/sysdeps/pthread/unwind-forcedunwind.c
> +++ b/nptl/sysdeps/pthread/unwind-forcedunwind.c
> @@ -46,6 +46,10 @@ pthread_cancel_init (void)
> {
> /* Force gcc to reload all values. */
> asm volatile ("" ::: "memory");
> + /* Order reads so as to prevent speculation of loads
> + of libgcc_s_{resume,personality,forcedunwind,getcfa}
> + to points prior to loading libgcc_s_handle. */
I'd say:
s/loading libgcc_s_handle/the write barrier./g
It is the writes of the other thread to globals
libgcc_s_{resume,personality,forcedunwind,getcfa}
which have not yet been seen by this thread which
we are synchronizing using the read barrier.
> + atomic_read_barrier ();
> return;
> }
>
> diff --git a/ports/ChangeLog.arm b/ports/ChangeLog.arm
> index d44ea76..cdd2210 100644
> --- a/ports/ChangeLog.arm
> +++ b/ports/ChangeLog.arm
> @@ -1,3 +1,13 @@
> +2013-01-14 Jeff Law <law@redhat.com>
> +
> + * sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> + (pthread_cancel_init): Add missing call to atomic_read_barrier
> + in early return path.
> + (__gcc_personality_v0): Guard call to pthread_cancel_init with
> + a test of libgcc_s_handle, not libgcc_s_personality. Add
> + missing call to atomic_read_barrier.
> + (_Unwind_ForcedUnwind, _Unwind_GetCFA): Similarly.
> +
> 2013-01-02 Joseph Myers <joseph@codesourcery.com>
>
> * All files with FSF copyright notices: Update copyright dates
> diff --git a/ports/ChangeLog.ia64 b/ports/ChangeLog.ia64
> index 28d5076..b6e1b9c 100644
> --- a/ports/ChangeLog.ia64
> +++ b/ports/ChangeLog.ia64
> @@ -1,3 +1,9 @@
> +2013-01-14 Jeff Law <law@redhat.com>
> +
> + * sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> + (_Unwind_GetBSP): Guard call to pthread_cancel_init with
> + a test of libgcc_s_handle, not libgcc_s_getbsp.
> +
> 2013-01-02 Joseph Myers <joseph@codesourcery.com>
>
> * All files with FSF copyright notices: Update copyright dates
> diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> index 58ca9ac..1a081c8 100644
> --- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
> +++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
Please also fix this incorrect comment:
63 libgcc_s_getcfa = getcfa;
64 /* Make sure libgcc_s_getcfa is written last. Otherwise,
65 pthread_cancel_init might return early even when the pointer the
66 caller is interested in is not initialized yet. */
67 atomic_write_barrier ();
68 libgcc_s_handle = handle;
69 }
It should talk about libcc_s_handle not libgcc_s_getcfa.
> @@ -40,6 +40,10 @@ pthread_cancel_init (void)
> {
> /* Force gcc to reload all values. */
> asm volatile ("" ::: "memory");
> + /* Order reads so as to prevent speculation of loads
> + of libgcc_s_{resume,personality,forcedunwind,getcfa}
> + to points prior to loading libgcc_s_handle. */
> + atomic_read_barrier ();
Same comment as above.
> return;
> }
>
> @@ -132,8 +136,10 @@ __gcc_personality_v0 (_Unwind_State state,
> struct _Unwind_Exception *ue_header,
> struct _Unwind_Context *context)
> {
> - if (__builtin_expect (libgcc_s_personality == NULL, 0))
> + if (__builtin_expect (libgcc_s_handle == NULL, 0))
> pthread_cancel_init ();
> + else
> + atomic_read_barrier ();
>
> return libgcc_s_personality (state, ue_header, context);
> }
> @@ -142,8 +148,10 @@ _Unwind_Reason_Code
> _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
> void *stop_argument)
> {
> - if (__builtin_expect (libgcc_s_forcedunwind == NULL, 0))
> + if (__builtin_expect (libgcc_s_handle == NULL, 0))
> pthread_cancel_init ();
> + else
> + atomic_read_barrier ();
>
> return libgcc_s_forcedunwind (exc, stop, stop_argument);
> }
> @@ -151,8 +159,10 @@ _Unwind_ForcedUnwind (struct _Unwind_Exception *exc, _Unwind_Stop_Fn stop,
> _Unwind_Word
> _Unwind_GetCFA (struct _Unwind_Context *context)
> {
> - if (__builtin_expect (libgcc_s_getcfa == NULL, 0))
> + if (__builtin_expect (libgcc_s_handle == NULL, 0))
> pthread_cancel_init ();
> + else
> + atomic_read_barrier ();
>
> return libgcc_s_getcfa (context);
> }
Not sufficient.
You need to fix the assembly also.
83 /* It's vitally important that _Unwind_Resume not have a stack frame; the
84 ARM unwinder relies on register state at entrance. So we write this in
85 assembly. */
86
87 asm (
88 " .globl _Unwind_Resume\n"
89 " .type _Unwind_Resume, %function\n"
90 "_Unwind_Resume:\n"
91 " .cfi_sections .debug_frame\n"
92 " " CFI_STARTPROC "\n"
93 " stmfd sp!, {r4, r5, r6, lr}\n"
94 " " CFI_ADJUST_CFA_OFFSET (16)" \n"
95 " " CFI_REL_OFFSET (r4, 0) "\n"
96 " " CFI_REL_OFFSET (r5, 4) "\n"
97 " " CFI_REL_OFFSET (r6, 8) "\n"
98 " " CFI_REL_OFFSET (lr, 12) "\n"
99 " " CFI_REMEMBER_STATE "\n"
100 " ldr r4, 1f\n"
101 " ldr r5, 2f\n"
102 "3: add r4, pc, r4\n"
103 " ldr r3, [r4, r5]\n"
104 " mov r6, r0\n"
105 " cmp r3, #0\n"
106 " beq 4f\n"
The above assembly is equivalent to:
if (libgcc_s_resume == NULL)
pthread_cancel_init ();
Which is missing the atomic_read_barrier ().
107 "5: mov r0, r6\n"
108 " ldmfd sp!, {r4, r5, r6, lr}\n"
109 " " CFI_ADJUST_CFA_OFFSET (-16) "\n"
110 " " CFI_RESTORE (r4) "\n"
111 " " CFI_RESTORE (r5) "\n"
112 " " CFI_RESTORE (r6) "\n"
113 " " CFI_RESTORE (lr) "\n"
114 " bx r3\n"
115 " " CFI_RESTORE_STATE "\n"
116 "4: bl pthread_cancel_init\n"
117 " ldr r3, [r4, r5]\n"
118 " b 5b\n"
119 " " CFI_ENDPROC "\n"
120 " .align 2\n"
121 #ifdef __thumb2__
122 "1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 4\n"
123 #else
124 "1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 8\n"
125 #endif
126 "2: .word libgcc_s_resume(GOTOFF)\n"
127 " .size _Unwind_Resume, .-_Unwind_Resume\n"
128 );
Thus you need something like this completely untested patch:
diff --git a/ports/sysdeps/arm/bits/atomic.h b/ports/sysdeps/arm/bits/atomic.h
index 6e20df7..b0f45ad 100644
--- a/ports/sysdeps/arm/bits/atomic.h
+++ b/ports/sysdeps/arm/bits/atomic.h
@@ -42,6 +42,7 @@ void __arm_link_error (void);
# define atomic_full_barrier() __sync_synchronize ()
#else
# define atomic_full_barrier() __arm_assisted_full_barrier ()
+# define atomic_asm_full_barrier() __arm_asm_assisted_full_barrier ()
#endif
/* An OS-specific bits/atomic.h file will define this macro if
diff --git a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
index c76b8f3..726d3ce 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
+++ b/ports/sysdeps/unix/sysv/linux/arm/bits/atomic.h
@@ -17,7 +17,8 @@
<http://www.gnu.org/licenses/>. */
/* If the compiler doesn't provide a primitive, we'll use this macro
- to get assistance from the kernel. */
+ to get assistance from the kernel. This should call the
+ __kuser_memory_barrier helper in the kernel. */
#ifdef __thumb2__
# define __arm_assisted_full_barrier() \
__asm__ __volatile__ \
@@ -25,6 +26,11 @@
"movt\tip, #0xffff\n\t" \
"blx\tip" \
: : : "ip", "lr", "cc", "memory");
+/* The asm variant is used as an insert into existing asm statements. */
+# define __arm_asm_assisted_full_barrier() \
+ " movw ip, #0x0fa0\n" \
+ " movt ip, #0xffff\n" \
+ " blx ip"
#else
# define __arm_assisted_full_barrier() \
__asm__ __volatile__ \
@@ -32,6 +38,11 @@
"mov\tlr, pc\n\t" \
"add\tpc, ip, #(0xffff0fa0 - 0xffff0fff)" \
: : : "ip", "lr", "cc", "memory");
+/* The asm variant is used as an insert into existing asm statements. */
+# define __arm_asm_assisted_full_barrier() \
+ " mov ip, #0xffff0fff\n" \
+ " mov lr, pc\n" \
+ " add pc, ip, #(0xffff0fa0 - 0xffff0fff)"
#endif
/* Atomic compare and exchange. This sequence relies on the kernel to
diff --git a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
index 58ca9ac..d7ac847 100644
--- a/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
+++ b/ports/sysdeps/unix/sysv/linux/arm/nptl/unwind-forcedunwind.c
@@ -104,6 +104,7 @@ asm (
" mov r6, r0\n"
" cmp r3, #0\n"
" beq 4f\n"
+ atomic_asm_full_barrier()
"5: mov r0, r6\n"
" ldmfd sp!, {r4, r5, r6, lr}\n"
" " CFI_ADJUST_CFA_OFFSET (-16) "\n"
@@ -123,7 +124,7 @@ asm (
#else
"1: .word _GLOBAL_OFFSET_TABLE_ - 3b - 8\n"
#endif
-"2: .word libgcc_s_resume(GOTOFF)\n"
+"2: .word libgcc_s_handle(GOTOFF)\n"
" .size _Unwind_Resume, .-_Unwind_Resume\n"
);
---
Note:
* As of ARMv6 we should really have used: `MCR p15, 0, r0, c7, c10, 5'
instead of calling the kernel helper.
* As of ARMv7 we should really have used: `DMB' instead of calling the
kernel barrier.
* As of ARMv7 we should be using DMB with specific completers
to implement atomic_read_barrier() and atomic_write_barrier().
> diff --git a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> index 8562afd..1d197e1 100644
> --- a/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> +++ b/ports/sysdeps/unix/sysv/linux/ia64/nptl/unwind-forcedunwind.c
> @@ -31,8 +31,10 @@ static _Unwind_Word (*libgcc_s_getbsp) (struct _Unwind_Context *);
> _Unwind_Word
> _Unwind_GetBSP (struct _Unwind_Context *context)
> {
> - if (__builtin_expect (libgcc_s_getbsp == NULL, 0))
> + if (__builtin_expect (libgcc_s_handle == NULL, 0))
> pthread_cancel_init ();
> + else
> + atomic_read_barrier ();
>
> return libgcc_s_getbsp (context);
> }
The IA64 fix looks OK to me.
Cheers,
Carlos.