Bug 14750 - Race condition in posix_spawn vfork usage vs signal handlers
Summary: Race condition in posix_spawn vfork usage vs signal handlers
Status: RESOLVED FIXED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: unspecified
: P2 normal
Target Milestone: 2.24
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on: 20178
Blocks:
  Show dependency treegraph
 
Reported: 2012-10-21 20:41 UTC by Rich Felker
Modified: 2016-05-30 14:39 UTC (History)
4 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:
fweimer: security-


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Rich Felker 2012-10-21 20:41:49 UTC
When posix_spawn uses vfork, it does not block signals. This allows the parent process's signal handlers to get invoked in the child process, corrupting the parent process's state. For example:

1. Memory state will be as if the signal handler ran, but other state such as signal dispositions, open files, etc. modified from the signal handler will not be reflected in the parent.
2. The same signal (assuming the signal was sent to an entire process-group, which is the main way a signal could arrive in the new child) may be processed twice in the context of the parent process's memory space.
3. Properties of the child process (e.g. its pid) may end up stored in the parent process's address space.

These are just a few examples; there should be plenty more ways things can go wrong.

To fix the problem, the vfork/exec process needs to follow the steps below:

1. Mask all signals (including NPTL-internal ones)
2. vfork
3. In child, reset all signal dispositions to SIG_DFL unless the existing disposition is SIG_IGN.
4. In child, restore the original signal mask.
5. In child, finish up and exec/_exit.
6. In parent, restore the original signal mask.

Note that step 3 would happen in kernelspace as part of exec anyway, but it must be done explicitly in userspace to make it safe to unmask signals.

As an alternative, restoring the signal mask, and all of the post-fork work of posix_spawn, could be outsourced to an external program, i.e. first exec $prefix/libexec/posix_spawn, which would restore signals, perform the file actions, etc.
Comment 1 Rich Felker 2014-02-11 21:34:51 UTC
Ping.
Comment 2 Carlos O'Donell 2014-09-20 04:11:26 UTC
I agree this should be fixed, but I don't see why step (3) or (4) is required. It seems like a QoI issue. That is to say you want to allow signals targetted at the child to reach the child, but is it really required?
Comment 3 cvs-commit@gcc.gnu.org 2016-03-07 04:56:54 UTC
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "GNU C Library master sources".

The branch, master has been updated
       via  9ff72da471a509a8c19791efe469f47fa6977410 (commit)
       via  1eb8930608705702d5746e5491bab4e4429fcb83 (commit)
       via  f83bb9b8e97656ae0d3e2a31e859363e2d4d5832 (commit)
      from  fee9eb6200f0e44a4b684903bc47fde36d46f1a5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9ff72da471a509a8c19791efe469f47fa6977410

commit 9ff72da471a509a8c19791efe469f47fa6977410
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Tue Jan 19 17:33:32 2016 -0200

    posix: New Linux posix_spawn{p} implementation
    
    This patch implements a new posix_spawn{p} implementation for Linux.  The main
    difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK
    flags and a direct allocated stack.  The new stack and start function solves
    most the vfork limitation (possible parent clobber due stack spilling).  The
    remaning issue are related to signal handling:
    
      1. That no signal handlers must run in child context, to avoid corrupt
         parent's state.
      2. Child must synchronize with parent to enforce stack deallocation and
         to possible return execv issues.
    
    The first one is solved by blocking all signals in child, even NPTL-internal
    ones (SIGCANCEL and SIGSETXID).  The second issue is done by a stack allocation
    in parent and a synchronization with using a pipe or waitpid (in case or error).
    The pipe has the advantage of allowing the child signal an exec error (checked
    with new tst-spawn2 test).
    
    There is an inherent race condition in pipe2 usage for architectures that do not
    support the syscall directly.  In such cases the a pipe plus fctnl is used
    instead and it may lead to file descriptor leak in parent (as decribed by fcntl
    documentation).
    
    The child process stack is allocate with a mmap with MAP_STACK flag using
    default architecture stack size.  Although it is slower than use a stack buffer
    from parent, it allows some slack for the compatibility code to run scripts
    with no shebang (which may use a buffer with size depending of argument list
    count).
    
    Performance should be similar to the vfork default posix implementation and
    way faster than fork path (vfork on mostly linux ports are basically
    clone with CLONE_VM plus CLONE_VFORK).  The only difference is the syscalls
    required for the stack allocation/deallocation.
    
    It fixes BZ#10354, BZ#14750, and BZ#18433.
    
    Tested on i386, x86_64, powerpc64le, and aarch64.
    
    	[BZ #14750]
    	[BZ #10354]
    	[BZ #18433]
    	* include/sched.h (__clone): Add hidden prototype.
    	(__clone2): Likewise.
    	* include/unistd.h (__dup): Likewise.
    	* posix/Makefile (tests): Add tst-spawn2.
    	* posix/tst-spawn2.c: New file.
    	* sysdeps/posix/dup.c (__dup): Add hidden definition.
    	* sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/alpha/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/hppa/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/i386/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/ia64/clone2.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/m68k/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/microblaze/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/mips/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/nios2/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone):
    	Likewise.
    	* sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S (__clone):
    	Likewise.
    	* sysdeps/unix/sysv/linux/s390/s390-32/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/s390/s390-64/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/sh/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/sparc/sparc32/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/sparc/sparc64/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/tile/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/x86_64/clone.S (__clone): Likewise.
    	* sysdeps/unix/sysv/linux/nptl-signals.h
    	(____nptl_is_internal_signal): New function.
    	* sysdeps/unix/sysv/linux/spawni.c: New file.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1eb8930608705702d5746e5491bab4e4429fcb83

commit 1eb8930608705702d5746e5491bab4e4429fcb83
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Jan 22 09:58:49 2016 -0200

    posix: execvpe cleanup
    
    This patch removes all the dynamic allocation on execvpe code and
    instead use direct stack allocation.  This is QoI approach to make
    it possible use in scenarios where memory is shared with parent
    (vfork or clone with CLONE_VM).
    
    For default process spawn (script file without a shebang), stack
    allocation is bounded by NAME_MAX plus PATH_MAX plus 1.  Large
    file arguments returns an error (ENAMETOOLONG).  This differs than
    current GLIBC pratice in general, but it used to limit stack
    allocation for large inputs.  Also, path in PATH environment variable
    larger than PATH_MAX are ignored.
    
    The shell direct execution exeception, where execve returns ENOEXEC,
    might requires a large stack allocation due large input argument list.
    
    Tested on i686, x86_64, powerpc64le, and aarch64.
    
    	* posix/execvpe.c (__execvpe): Remove dynamic allocation.
    	* posix/Makefile (tests): Add tst-execvpe{1,2,3,4,5,6}.
    	* posix/tst-execvp1.c (do_test): Use a macro to call execvp.
    	* posix/tst-execvp2.c (do_test): Likewise.
    	* posix/tst-execvp3.c (do_test): Likewise.
    	* posix/tst-execvp4.c (do_test): Likewise.
    	* posix/tst-execvpe1.c: New file.
    	* posix/tst-execvpe2.c: Likewise.
    	* posix/tst-execvpe3.c: Likewise.
    	* posix/tst-execvpe4.c: Likewise.
    	* posix/tst-execvpe5.c: Likewise.
    	* posix/tst-execvpe6.c: Likewise.

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f83bb9b8e97656ae0d3e2a31e859363e2d4d5832

commit f83bb9b8e97656ae0d3e2a31e859363e2d4d5832
Author: Adhemerval Zanella <adhemerval.zanella@linaro.org>
Date:   Fri Jan 29 11:43:40 2016 -0200

    posix: Remove dynamic memory allocation from execl{e,p}
    
    GLIBC execl{e,p} implementation might use malloc if the total number of
    arguments exceed initial assumption size (1024).  This might lead to
    issues in two situations:
    
    1. execl/execle is stated to be async-signal-safe by POSIX [1].  However
       if execl is used in a signal handler with a large argument set (that
       may call malloc internally) and if the resulting call fails it might
       lead malloc in the program in a bad state.
    
    2. If the functions are used in a vfork/clone(VFORK) situation it also
       might issue malloc internal bad state.
    
    This patch fixes it by using stack allocation instead.  It also fixes
    BZ#19534.
    
    Tested on x86_64.
    
    [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html
    
    	[BZ #19534]
    	* posix/execl.c (execl): Remove dynamic memory allocation.
    	* posix/execle.c (execle): Likewise.
    	* posix/execlp.c (execlp): Likewise.

-----------------------------------------------------------------------

Summary of changes:
 ChangeLog                                         |   54 +++
 include/sched.h                                   |    2 +
 include/unistd.h                                  |    1 +
 posix/Makefile                                    |    5 +-
 posix/execl.c                                     |   68 ++---
 posix/execle.c                                    |   70 ++---
 posix/execlp.c                                    |   66 ++--
 posix/execvpe.c                                   |  255 ++++++--------
 posix/tst-execvp1.c                               |    6 +-
 posix/tst-execvp2.c                               |    5 +-
 posix/tst-execvp3.c                               |    5 +-
 posix/tst-execvp4.c                               |    6 +-
 posix/tst-execvpe1.c                              |   20 +
 posix/tst-execvpe2.c                              |   20 +
 posix/tst-execvpe3.c                              |   20 +
 posix/tst-execvpe4.c                              |   20 +
 posix/tst-execvpe5.c                              |  157 ++++++++
 posix/tst-execvpe6.c                              |  150 ++++++++
 posix/tst-spawn2.c                                |   72 ++++
 sysdeps/posix/dup.c                               |    2 +-
 sysdeps/unix/sysv/linux/aarch64/clone.S           |    1 +
 sysdeps/unix/sysv/linux/alpha/clone.S             |    1 +
 sysdeps/unix/sysv/linux/arm/clone.S               |    1 +
 sysdeps/unix/sysv/linux/hppa/clone.S              |    1 +
 sysdeps/unix/sysv/linux/i386/clone.S              |    1 +
 sysdeps/unix/sysv/linux/ia64/clone2.S             |    2 +
 sysdeps/unix/sysv/linux/m68k/clone.S              |    1 +
 sysdeps/unix/sysv/linux/microblaze/clone.S        |    1 +
 sysdeps/unix/sysv/linux/mips/clone.S              |    1 +
 sysdeps/unix/sysv/linux/nios2/clone.S             |    1 +
 sysdeps/unix/sysv/linux/nptl-signals.h            |   10 +
 sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S |    1 +
 sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S |    1 +
 sysdeps/unix/sysv/linux/s390/s390-32/clone.S      |    2 +
 sysdeps/unix/sysv/linux/s390/s390-64/clone.S      |    2 +
 sysdeps/unix/sysv/linux/sh/clone.S                |    1 +
 sysdeps/unix/sysv/linux/sparc/sparc32/clone.S     |    1 +
 sysdeps/unix/sysv/linux/sparc/sparc64/clone.S     |    1 +
 sysdeps/unix/sysv/linux/spawni.c                  |  404 +++++++++++++++++++++
 sysdeps/unix/sysv/linux/tile/clone.S              |    1 +
 sysdeps/unix/sysv/linux/x86_64/clone.S            |    1 +
 41 files changed, 1162 insertions(+), 278 deletions(-)
 create mode 100644 posix/tst-execvpe1.c
 create mode 100644 posix/tst-execvpe2.c
 create mode 100644 posix/tst-execvpe3.c
 create mode 100644 posix/tst-execvpe4.c
 create mode 100644 posix/tst-execvpe5.c
 create mode 100644 posix/tst-execvpe6.c
 create mode 100644 posix/tst-spawn2.c
 create mode 100644 sysdeps/unix/sysv/linux/spawni.c
Comment 4 Adhemerval Zanella 2016-03-07 05:37:11 UTC
Fixed by 9ff72da471a509a8c19791efe469f47fa6977410.