glibc posix_spawn uses vfork() in some cases, fork() in others. Currently it is rather conservative in this regard. For example, if there are any file actions, vfork() is avoided. This restriction can be lifted, I think, especially for the common case of closing file descriptors.
No. All file operations are cancellation points. This cannot be done after vfork.
The excuse for not fixing this does not make sense. The syscalls could be used directly (without the wrapper code for cancellation), or cancellation could simply be disabled by the parent prior to vfork and reenabled by the parent after vfork returns (which won't happen until the child calls exec).
(In reply to comment #2) > The excuse for not fixing this does not make sense. What doesn't make sense is you. The function is defined in terms of using close, open etc. The semantics is fixed.
It's defined in terms of open, close, etc. happening in the context of the child process, where cancellation cannot play any role since there's no way to obtain a pthread_t value referring to a thread in the child. Of course implementing posix_spawn using vfork causes the cancellation state to be shared with the parent, which would be problematic unless special care is made to protect the state. I suggested several ways this could be done to allow vfork to be used for posix_spawn in all cases.
This bug was marked "fixed" for no apparent reason. The issues reported still exist; vfork is not used if any non-default attributes or file actions are present. This makes it impossible to use posix_spawn from extremely large processes due to commit charge exhaustion.
*** Bug 260998 has been marked as a duplicate of this bug. *** Seen from the domain http://volichat.com Page where seen: http://volichat.com/adult-chat-rooms Marked for reference. Resolved as fixed @bugzilla.
It's important that posix_spawn() avoids fork() where possible as that's the whole reason for having this interface. I see that one can force using vfork() with the POSIX_SPAWN_USEVFORK flag as mentioned in bug #378 though then you're on your own wrt safety. So I'm wondering what specifically callers of posix_spawn() would have to be careful to do? * Not setup pthread_atfork() handlers * pthread_setcancelstate(PTHREAD_CANCEL_DISABLE, NULL); first (and enable after) * anything else? Also I was wondering whether some restrictions could be lifted in the implementation by using clone() rather than vfork(). I.E. vfork() where possible to support MMU less systems, but then fall back to clone(..., CLONE_VM | SIGCHLD, ...); to at least have an otherwise efficient implementation.
Using clone with CLONE_VM and a new stack/start-function solves most of the insurmountable problems with vfork that make it unsafe, but there are still a number of considerations that come from sharing memory. In particular, you must ensure that no signal handlers run in the child while it's sharing memory with the parent, and the parent must be responsible for freeing the child's stack, and thus must synchronize with the execve call in the child. I have a working implementation in musl which solves these problems (it uses clone with CLONE_VM for all posix_spawn operations) if anyone is interested in looking at it and doing something similar for glibc.
(In reply to Rich Felker from comment #8) > Using clone with CLONE_VM and a new stack/start-function solves most of the > insurmountable problems with vfork that make it unsafe, but there are still > a number of considerations that come from sharing memory. In particular, you > must ensure that no signal handlers run in the child while it's sharing > memory with the parent, and the parent must be responsible for freeing the > child's stack, and thus must synchronize with the execve call in the child. > > I have a working implementation in musl which solves these problems (it uses > clone with CLONE_VM for all posix_spawn operations) if anyone is interested > in looking at it and doing something similar for glibc. I agree this issue should be fixed. posix_spawn() could use vfork() in many more cases than it does today, and cancellation is no excuse not to fix this. I agree that cancellation in the context of the child doesn't make any sense. The child needs to just not touch the cancel state, and with signals blocked it will never act on a SIGCANCEL request anyway. We can perhaps revisit this after we fix cancellation to work correctly.
preliminary timing of @musllibc's posix_spawn vs fork+exec shows it ~25x faster for large parent processes. (~360µs vs 9ms for 40MB in small mappings). For a trivial staic binary the difference is 280µs vs 450µs
Comment 10 is quoting some of my tweets, starting with: https://twitter.com/RichFelker/status/602313644026761216
This is an automated email from the git hooks/post-receive script. It was generated because a ref change was pushed to the repository containing the project "GNU C Library master sources". The branch, master has been updated via 9ff72da471a509a8c19791efe469f47fa6977410 (commit) via 1eb8930608705702d5746e5491bab4e4429fcb83 (commit) via f83bb9b8e97656ae0d3e2a31e859363e2d4d5832 (commit) from fee9eb6200f0e44a4b684903bc47fde36d46f1a5 (commit) Those revisions listed above that are new to this repository have not appeared on any other notification email; so we list those revisions in full, below. - Log ----------------------------------------------------------------- https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=9ff72da471a509a8c19791efe469f47fa6977410 commit 9ff72da471a509a8c19791efe469f47fa6977410 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Tue Jan 19 17:33:32 2016 -0200 posix: New Linux posix_spawn{p} implementation This patch implements a new posix_spawn{p} implementation for Linux. The main difference is it uses the clone syscall directly with CLONE_VM and CLONE_VFORK flags and a direct allocated stack. The new stack and start function solves most the vfork limitation (possible parent clobber due stack spilling). The remaning issue are related to signal handling: 1. That no signal handlers must run in child context, to avoid corrupt parent's state. 2. Child must synchronize with parent to enforce stack deallocation and to possible return execv issues. The first one is solved by blocking all signals in child, even NPTL-internal ones (SIGCANCEL and SIGSETXID). The second issue is done by a stack allocation in parent and a synchronization with using a pipe or waitpid (in case or error). The pipe has the advantage of allowing the child signal an exec error (checked with new tst-spawn2 test). There is an inherent race condition in pipe2 usage for architectures that do not support the syscall directly. In such cases the a pipe plus fctnl is used instead and it may lead to file descriptor leak in parent (as decribed by fcntl documentation). The child process stack is allocate with a mmap with MAP_STACK flag using default architecture stack size. Although it is slower than use a stack buffer from parent, it allows some slack for the compatibility code to run scripts with no shebang (which may use a buffer with size depending of argument list count). Performance should be similar to the vfork default posix implementation and way faster than fork path (vfork on mostly linux ports are basically clone with CLONE_VM plus CLONE_VFORK). The only difference is the syscalls required for the stack allocation/deallocation. It fixes BZ#10354, BZ#14750, and BZ#18433. Tested on i386, x86_64, powerpc64le, and aarch64. [BZ #14750] [BZ #10354] [BZ #18433] * include/sched.h (__clone): Add hidden prototype. (__clone2): Likewise. * include/unistd.h (__dup): Likewise. * posix/Makefile (tests): Add tst-spawn2. * posix/tst-spawn2.c: New file. * sysdeps/posix/dup.c (__dup): Add hidden definition. * sysdeps/unix/sysv/linux/aarch64/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/alpha/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/arm/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/hppa/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/i386/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/ia64/clone2.S (__clone): Likewise. * sysdeps/unix/sysv/linux/m68k/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/microblaze/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/mips/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/nios2/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/s390/s390-32/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/s390/s390-64/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/sh/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/sparc/sparc32/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/sparc/sparc64/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/tile/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/x86_64/clone.S (__clone): Likewise. * sysdeps/unix/sysv/linux/nptl-signals.h (____nptl_is_internal_signal): New function. * sysdeps/unix/sysv/linux/spawni.c: New file. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=1eb8930608705702d5746e5491bab4e4429fcb83 commit 1eb8930608705702d5746e5491bab4e4429fcb83 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Fri Jan 22 09:58:49 2016 -0200 posix: execvpe cleanup This patch removes all the dynamic allocation on execvpe code and instead use direct stack allocation. This is QoI approach to make it possible use in scenarios where memory is shared with parent (vfork or clone with CLONE_VM). For default process spawn (script file without a shebang), stack allocation is bounded by NAME_MAX plus PATH_MAX plus 1. Large file arguments returns an error (ENAMETOOLONG). This differs than current GLIBC pratice in general, but it used to limit stack allocation for large inputs. Also, path in PATH environment variable larger than PATH_MAX are ignored. The shell direct execution exeception, where execve returns ENOEXEC, might requires a large stack allocation due large input argument list. Tested on i686, x86_64, powerpc64le, and aarch64. * posix/execvpe.c (__execvpe): Remove dynamic allocation. * posix/Makefile (tests): Add tst-execvpe{1,2,3,4,5,6}. * posix/tst-execvp1.c (do_test): Use a macro to call execvp. * posix/tst-execvp2.c (do_test): Likewise. * posix/tst-execvp3.c (do_test): Likewise. * posix/tst-execvp4.c (do_test): Likewise. * posix/tst-execvpe1.c: New file. * posix/tst-execvpe2.c: Likewise. * posix/tst-execvpe3.c: Likewise. * posix/tst-execvpe4.c: Likewise. * posix/tst-execvpe5.c: Likewise. * posix/tst-execvpe6.c: Likewise. https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=f83bb9b8e97656ae0d3e2a31e859363e2d4d5832 commit f83bb9b8e97656ae0d3e2a31e859363e2d4d5832 Author: Adhemerval Zanella <adhemerval.zanella@linaro.org> Date: Fri Jan 29 11:43:40 2016 -0200 posix: Remove dynamic memory allocation from execl{e,p} GLIBC execl{e,p} implementation might use malloc if the total number of arguments exceed initial assumption size (1024). This might lead to issues in two situations: 1. execl/execle is stated to be async-signal-safe by POSIX [1]. However if execl is used in a signal handler with a large argument set (that may call malloc internally) and if the resulting call fails it might lead malloc in the program in a bad state. 2. If the functions are used in a vfork/clone(VFORK) situation it also might issue malloc internal bad state. This patch fixes it by using stack allocation instead. It also fixes BZ#19534. Tested on x86_64. [1] http://pubs.opengroup.org/onlinepubs/9699919799/functions/V2_chap02.html [BZ #19534] * posix/execl.c (execl): Remove dynamic memory allocation. * posix/execle.c (execle): Likewise. * posix/execlp.c (execlp): Likewise. ----------------------------------------------------------------------- Summary of changes: ChangeLog | 54 +++ include/sched.h | 2 + include/unistd.h | 1 + posix/Makefile | 5 +- posix/execl.c | 68 ++--- posix/execle.c | 70 ++--- posix/execlp.c | 66 ++-- posix/execvpe.c | 255 ++++++-------- posix/tst-execvp1.c | 6 +- posix/tst-execvp2.c | 5 +- posix/tst-execvp3.c | 5 +- posix/tst-execvp4.c | 6 +- posix/tst-execvpe1.c | 20 + posix/tst-execvpe2.c | 20 + posix/tst-execvpe3.c | 20 + posix/tst-execvpe4.c | 20 + posix/tst-execvpe5.c | 157 ++++++++ posix/tst-execvpe6.c | 150 ++++++++ posix/tst-spawn2.c | 72 ++++ sysdeps/posix/dup.c | 2 +- sysdeps/unix/sysv/linux/aarch64/clone.S | 1 + sysdeps/unix/sysv/linux/alpha/clone.S | 1 + sysdeps/unix/sysv/linux/arm/clone.S | 1 + sysdeps/unix/sysv/linux/hppa/clone.S | 1 + sysdeps/unix/sysv/linux/i386/clone.S | 1 + sysdeps/unix/sysv/linux/ia64/clone2.S | 2 + sysdeps/unix/sysv/linux/m68k/clone.S | 1 + sysdeps/unix/sysv/linux/microblaze/clone.S | 1 + sysdeps/unix/sysv/linux/mips/clone.S | 1 + sysdeps/unix/sysv/linux/nios2/clone.S | 1 + sysdeps/unix/sysv/linux/nptl-signals.h | 10 + sysdeps/unix/sysv/linux/powerpc/powerpc32/clone.S | 1 + sysdeps/unix/sysv/linux/powerpc/powerpc64/clone.S | 1 + sysdeps/unix/sysv/linux/s390/s390-32/clone.S | 2 + sysdeps/unix/sysv/linux/s390/s390-64/clone.S | 2 + sysdeps/unix/sysv/linux/sh/clone.S | 1 + sysdeps/unix/sysv/linux/sparc/sparc32/clone.S | 1 + sysdeps/unix/sysv/linux/sparc/sparc64/clone.S | 1 + sysdeps/unix/sysv/linux/spawni.c | 404 +++++++++++++++++++++ sysdeps/unix/sysv/linux/tile/clone.S | 1 + sysdeps/unix/sysv/linux/x86_64/clone.S | 1 + 41 files changed, 1162 insertions(+), 278 deletions(-) create mode 100644 posix/tst-execvpe1.c create mode 100644 posix/tst-execvpe2.c create mode 100644 posix/tst-execvpe3.c create mode 100644 posix/tst-execvpe4.c create mode 100644 posix/tst-execvpe5.c create mode 100644 posix/tst-execvpe6.c create mode 100644 posix/tst-spawn2.c create mode 100644 sysdeps/unix/sysv/linux/spawni.c
Fixed by 9ff72da471a509a8c19791efe469f47fa6977410.