25201 – __libc_fork: parent thread stack corruption while looping through 'allp' for calling parent_handler

Bug 25201 - __libc_fork: parent thread stack corruption while looping through 'allp' for calling parent_handler

Summary: __libc_fork: parent thread stack corruption while looping through 'allp' for ...

Status:	WAITING

Alias:	None

Product:	glibc
Classification:	Unclassified
Component:	libc (show other bugs)
Version:	2.17

Importance:	P2 normal
Target Milestone:	---
Assignee:	Not yet assigned to anyone

URL:
Keywords:

Depends on:
Blocks:

Reported:	2019-11-18 09:28 UTC by Matthew Qvap
Modified:	2019-11-18 10:43 UTC (History)
CC List:	1 user (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:	2019-11-18 00:00:00

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this bug.

Description Matthew Qvap 2019-11-18 09:28:32 UTC

Environment: Redhat 7.7/libc-2.17.so

    Reproducible: first occurence (for thousands of installations).

    Description: at production, proprietary application forks to call /bin/df utility. It does it twice in sequence (ie not parallel invocation), one after another. Crash happens within fork library call, after fork has been performed. Parent process is crashing.

    Details:
    Parent is going to go through loop:
    /* Run the handlers registered for the parent.  */
    while (allp != NULL)
      {
        if (allp->handler->parent_handler != NULL)              // CRASH here.
          allp->handler->parent_handler ();

        if (atomic_decrement_and_test (&allp->handler->refcntr)
            && allp->handler->need_signal)
          lll_futex_wake (allp->handler->refcntr, 1, LLL_PRIVATE);
        allp = allp->next;
      }

    __fork_handlers keeps one handler:
    (gdb) x/x 0x7f084f08dd78
     0x7f084f08dd78 <__fork_handlers>:	0x00007f084f08be48
    (gdb) print__fork_handlers 0x00007f084f08be48
    $1 = {
      next = 0x0 <main>, 
      prepare_handler = 0x0 <main>, 
      parent_handler = 0x0 <main>, 
      child_handler = 0x7f084eaac260 <__reclaim_stacks>, 
      dso_handle = 0x0 <main>, 
      refcntr = 2, 
      need_signal = 0
    }
    (gdb) x/i 0x00007f084f08be48
     0x7f084f08be48 <fork_handler_pool+8>:	add    %al,(%rax)

    allp pointer ($rbx) points to the bottom of stack - implies it points to the very first element of used_handlers list, thus the one that wraps __reclaim_stacks handler.
    (gdb) info reg $rsp
     rsp            0x7f082d7f39d0      0x7f082d7f39d0
    (gdb) info reg $rbx
     rbx            0x7f082d7f39d0      139673099778512

    but used_handler is corrupted - 'handler' value should be 0x7f084f08be48 instead of 0x13a0f00013a0f.
    (gdb) p *(struct used_handler*)$rbx
    $14 = {
      handler = 0x13a0f00013a0f, 
      next = 0x7f082d7fa9e0
    }

    10 pages of stack have been searched for 0x7f084f08be48 - not found.

    As seen above, proprietary application does not register any handlers to fork call.
    0x13a0f00013a0f - looks somewhat repeated. I'm searching for this value within the core, but anyway:
    Have You already met conditions above? if Yes, is there a fix?
    Does it look like a bug within libc?

Comment 1 Andreas Schwab 2019-11-18 10:30:21 UTC

The atfork handler handing has been refactored in version 2.28 (commit 27761a1042).  Does the crash still happen?

Comment 2 Matthew Qvap 2019-11-18 10:43:35 UTC

(In reply to Andreas Schwab from comment #1)
> The atfork handler handing has been refactored in version 2.28 (commit
> 27761a1042).  Does the crash still happen?

it happened once (per thousands installations), the only thing I can look at is core file from that event - this is what I based bug report.

would the things described above, be able to happen on code prior to 2.28 refactor (I mean did You see the stack looking simillar)?