This is the mail archive of the
libc-alpha@sourceware.org
mailing list for the glibc project.
Re: [PATCH] posix: Fix improper assert in Linux posix_spawn (BZ#22273)
Adhemerval Zanella <adhemerval.zanella@linaro.org> writes:
> As noted by Florian Weimer, current Linux posix_spawn implementation
> can trigger an assert if the auxiliary process is terminated before
> actually setting the err member:
>
> 340 /* Child must set args.err to something non-negative - we rely on
> 341 the parent and child sharing VM. */
> 342 args.err = -1;
> [...]
> 362 new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
> 363 CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
> 364
> 365 if (new_pid > 0)
> 366 {
> 367 ec = args.err;
> 368 assert (ec >= 0);
>
> Another possible issue is killing the child between setting the err and
> actually calling execve. In this case the process will not ran, but
> posix_spawn also will not report any error:
>
> 269
> 270 args->err = 0;
> 271 args->exec (args->file, args->argv, args->envp);
>
> As suggested by Andreas Schwab, this patch removes the faulty assert
> and also handles any signal that happens before fork and execve as the
> spawn was successful (and thus relaying the handling to the caller to
> figure this out). Different than Florian, I can not see why using
> atomics to set err would help here, essentially the code runs
> sequentially (due CLONE_VFORK) and I think it would not be legal the
> compiler evaluate ec without checking for new_pid result (thus there
> is no need to compiler barrier).
>
> Summarizing the possible scenarios on posix_spawn execution, we
> have:
>
> 1. For default case with a success execution, args.err will be 0, pid
> will not be collected and it will be reported to caller.
>
> 2. For default failure case, args.err will be positive and the it will
> be collected by the waitpid. An error will be reported to the
> caller.
>
> 3. For the unlikely case where the process was terminated and not
> collected by a caller signal handler, it will be reported as succeful
> execution and not be collected by posix_spawn (since args.err will
> be 0). The caller will need to actually handle this case.
>
> 4. For the unlikely case where the process was terminated and collected
> by caller we have 3 other possible scenarios:
>
> 4.1. The auxiliary process was terminated with args.err equal to 0:
> it will handled as 1. (so it does not matter if we hit the pid
> reuse race since we won't possible collect an unexpected
> process).
>
> 4.2. The auxiliary process was terminated after execve (due a failure
> in calling it) and before setting args.err to -1: it will also
> be handle as 1. but with the issue of not be able to report the
> caller a possible execve failures.
>
> 4.3. The auxiliary process was terminated after args.err is set to -1:
> this is the case where it will be possible to hit the pid reuse
> case where we will need to collected the auxiliary pid but we
> can not be sure if it will be expected one. I think for this
> case we need to actually change waitpid to use WNOHANG to avoid
> hanging indefinitely on the call and report an error to caller
> since we can't differentiate between a default failure as 2.
> and a possible pid reuse race issue.
>
> Checked on x86_64-linux-gnu.
posix/tst-spawn and posix/tst-spawn2 started to fail after this patch.
Reproduced on aarch64 [1], ppc [2], ppc64 [3] and ppc64le.
posix/tst-spawn.out:
error: tst-spawn.c:257: not true: WEXITSTATUS (status) == 0
error: 1 test failures
posix/tst-spawn2.out:
error: tst-spawn2.c:56: waitpid: No such file or directory)
error: 1 test failures
[1] http://glibc-buildbot.reserved-bit.com/builders/glibc-aarch64-linux/builds/760
[2] http://glibc-buildbot.reserved-bit.com/builders/glibc-ppc-linux/builds/152
[3] http://glibc-buildbot.reserved-bit.com/builders/glibc-power8-linux/builds/728
--
Tulio Magno