Created attachment 9581 [details]
The attached test case occasionally writes to info.pad member during the pthread_cond_wait function call, and not to the info.cond and info.mutex members.
The proximate cause is that __condvar_w_cleanup2 is called with the original (not incremented) value of %ebx.
I can reproduce the stray write under GDB with a hardware watchpoint on info.pad, or with a conditional breakpoint on __condvar_w_cleanup2, checking for the expected value of %ebx.
The root cause is still unclear. It happens with a wide range of glibc versions. We initially saw this on a 2.12-derived glibc without the fix for bug 14477, compiled with 4.4.7-derived GCC. I can reproduce it on Fedora 2.23 i386 (glibc 2.22, GCC 5.3.1). Potential root causes are incorrect register restoration in the unwind code (both glibc and libgcc), or invalid manually written unwind data in pthread_cond_wait.S.
Root cause is this in libgcc/unwind-c.c:
145 int ip_before_insn = 0;
173 /* Parse the LSDA header. */
174 p = parse_lsda_header (context, language_specific_data, &info);
175 #ifdef HAVE_GETIPINFO
176 ip = _Unwind_GetIPInfo (context, &ip_before_insn);
178 ip = _Unwind_GetIP (context);
180 if (! ip_before_insn)
182 landing_pad = 0;
i386 is a !HAVE_GETIPINFO architecture, so !ip_before_insn is always true, and we decrement ip.
This means that if SIGCANCEL hits at .Lsub_cond_futex/19 in pthread_cond_wait:
183 movl %ebp, %edx
184 addl $cond_futex, %ebx
186 movl $SYS_futex, %eax
188 subl $cond_futex, %ebx
191 19: movl (%esp), %eax
192 call __pthread_disable_asynccancel
, the unwinder assumes that signal happened at the last byte of subl, *within* the instruction range which calls __condvar_w_cleanup2.
Created attachment 9585 [details]
unwind-c.c instrumentation patch
I could not reproduce the race with a conditional breakpoint in __gcc_personality_v0, so I had to instrument it and use an unconditional breakpoint on unwind_break_on_pc, after setting unwind_break_pc.
I asked on the gcc list:
We do not need to treat this as a security issue because there does not appear to be sufficient impact on applications.
This was accepted as a GCC bug and fixed there. No action on the glibc is needed.