Bug 20719 - glibc: canceled pthread_cond_wait invokes wrong internal cleanup handler, leading to OOB write
Summary: glibc: canceled pthread_cond_wait invokes wrong internal cleanup handler, lea...
Status: RESOLVED MOVED
Alias: None
Product: glibc
Classification: Unclassified
Component: nptl (show other bugs)
Version: 2.22
: P2 normal
Target Milestone: ---
Assignee: Florian Weimer
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2016-10-19 18:04 UTC by Florian Weimer
Modified: 2016-10-24 18:29 UTC (History)
1 user (show)

See Also:
Host:
Target: i686
Build:
Last reconfirmed:
fweimer: security-


Attachments
tst-cancel-cond.c (1.62 KB, text/plain)
2016-10-19 18:04 UTC, Florian Weimer
Details
unwind-c.c instrumentation patch (553 bytes, text/plain)
2016-10-20 15:40 UTC, Florian Weimer
Details

Note You need to log in before you can comment on or make changes to this bug.
Description Florian Weimer 2016-10-19 18:04:16 UTC
Created attachment 9581 [details]
tst-cancel-cond.c

The attached test case occasionally writes to info.pad member during the pthread_cond_wait function call, and not to the info.cond and info.mutex members.

The proximate cause is that __condvar_w_cleanup2 is called with the original (not incremented) value of %ebx.

I can reproduce the stray write under GDB with a hardware watchpoint on info.pad, or with a conditional breakpoint on __condvar_w_cleanup2, checking for the expected value of %ebx.

The root cause is still unclear.  It happens with a wide range of glibc versions.  We initially saw this on a 2.12-derived glibc without the fix for bug 14477, compiled with 4.4.7-derived GCC.  I can reproduce it on Fedora 2.23 i386 (glibc 2.22, GCC 5.3.1).  Potential root causes are incorrect register restoration in the unwind code (both glibc and libgcc), or invalid manually written unwind data in pthread_cond_wait.S.
Comment 1 Florian Weimer 2016-10-20 15:37:27 UTC
Root cause is this in libgcc/unwind-c.c:

    145   int ip_before_insn = 0;
    …
    173   /* Parse the LSDA header.  */
    174   p = parse_lsda_header (context, language_specific_data, &info);
    175 #ifdef HAVE_GETIPINFO
    176   ip = _Unwind_GetIPInfo (context, &ip_before_insn);
    177 #else
    178   ip = _Unwind_GetIP (context);
    179 #endif
    180   if (! ip_before_insn)
    181     --ip;
    182   landing_pad = 0;

i386 is a !HAVE_GETIPINFO architecture, so !ip_before_insn is always true, and we decrement ip.

This means that if SIGCANCEL hits at .Lsub_cond_futex/19 in pthread_cond_wait:

    183         movl    %ebp, %edx
    184         addl    $cond_futex, %ebx
    185 .Ladd_cond_futex:
    186         movl    $SYS_futex, %eax
    187         ENTER_KERNEL
    188         subl    $cond_futex, %ebx
    189 .Lsub_cond_futex:
    190 
    191 19:     movl    (%esp), %eax
    192         call    __pthread_disable_asynccancel
    193 .LcleanupEND:

, the unwinder assumes that signal happened at the last byte of subl, *within* the instruction range which calls __condvar_w_cleanup2.
Comment 2 Florian Weimer 2016-10-20 15:40:18 UTC
Created attachment 9585 [details]
unwind-c.c instrumentation patch

I could not reproduce the race with a conditional breakpoint in __gcc_personality_v0, so I had to instrument it and use an unconditional breakpoint on unwind_break_on_pc, after setting unwind_break_pc.
Comment 3 Florian Weimer 2016-10-20 16:32:56 UTC
I asked on the gcc list:

  https://gcc.gnu.org/ml/gcc/2016-10/msg00165.html
Comment 4 Florian Weimer 2016-10-21 16:01:32 UTC
We do not need to treat this as a security issue because there does not appear to be sufficient impact on applications.
Comment 5 Florian Weimer 2016-10-24 18:29:05 UTC
This was accepted as a GCC bug and fixed there.  No action on the glibc is needed.