Bug 22273

Summary: Improper assert in Linux posix_spawn implementation
Product: glibc Reporter: Florian Weimer <fweimer>
Component: libcAssignee: Adhemerval Zanella <adhemerval.zanella>
Status: RESOLVED FIXED    
Severity: normal CC: adconrad, adhemerval.zanella, drepper.fsp
Priority: P2 Flags: fweimer: security-
Version: 2.26   
Target Milestone: 2.27   
See Also: https://launchpad.net/bugs/1719004
https://github.com/apple/cups/issues/5127
Host: Target: Linux
Build: Last reconfirmed:

Description Florian Weimer 2017-10-09 14:32:45 UTC
__spawnix has this code:

    340   /* Child must set args.err to something non-negative - we rely on
    341      the parent and child sharing VM.  */
    342   args.err = -1;
…
    354   /* The clone flags used will create a new child that will run in the same
    355      memory space (CLONE_VM) and the execution of calling thread will be
    356      suspend until the child calls execve or _exit.
    357 
    358      Also since the calling thread execution will be suspend, there is not
    359      need for CLONE_SETTLS.  Although parent and child share the same TLS
    360      namespace, there will be no concurrent access for TLS variables (errno
    361      for instance).  */
    362   new_pid = CLONE (__spawni_child, STACK (stack, stack_size), stack_size,
    363                    CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
    364 
    365   if (new_pid > 0)
    366     {
    367       ec = args.err;
    368       assert (ec >= 0);

The assert can fire if the child process dies before the err member is written.  Atomics should probably used to guard against compiler reordering of the store, too.
Comment 1 Adhemerval Zanella 2017-10-11 20:31:14 UTC
I do not see the scenario you described as expected because of CLONE_VFORK and every fail path is handled on 'fail' label.  On the code:

271   args->err = 0;                                                                  
272   args->exec (args->file, args->argv, args->envp);                                
273                                                                                   
274   /* This is compatibility function required to enable posix_spawn run            
275      script without shebang definition for older posix_spawn versions             
276      (2.15).  */                                                                  
277   maybe_script_execute (args);                                                    
278                                                                                   
279 fail:                                                                             
280   /* errno should have an appropriate non-zero value; otherwise,                  
281      there's a bug in glibc or the kernel.  For lack of an error code             
282      (EINTERNALBUG) describing that, use ECHILD.  Another option would            
283      be to set args->err to some negative sentinel and have the parent            
284      abort(), but that seems needlessly harsh.  */                                
285   args->err = errno ? : ECHILD;                                                   
286   _exit (SPAWN_ERROR);                                                            
287 }

So the input err memory position won't be accessed concurrently on parent because CLONE_VFORK (since kernel interface guarantee that calling process is suspended) and all return path on the child functions will be either default execution through args->exec/maybe_script_execute or through the fail label.
Comment 2 Florian Weimer 2017-10-11 20:36:04 UTC
(In reply to Adhemerval Zanella from comment #1)
> So the input err memory position won't be accessed concurrently on parent
> because CLONE_VFORK (since kernel interface guarantee that calling process
> is suspended) and all return path on the child functions will be either
> default execution through args->exec/maybe_script_execute or through the
> fail label.

But if the process is killed, this will not result in a vfork failure, will it?
Comment 3 Adhemerval Zanella 2017-10-11 21:24:19 UTC
(In reply to Florian Weimer from comment #0)
> __spawnix has this code:
> 
>     340   /* Child must set args.err to something non-negative - we rely on
>     341      the parent and child sharing VM.  */
>     342   args.err = -1;
> …
>     354   /* The clone flags used will create a new child that will run in
> the same
>     355      memory space (CLONE_VM) and the execution of calling thread
> will be
>     356      suspend until the child calls execve or _exit.
>     357 
>     358      Also since the calling thread execution will be suspend, there
> is not
>     359      need for CLONE_SETTLS.  Although parent and child share the
> same TLS
>     360      namespace, there will be no concurrent access for TLS variables
> (errno
>     361      for instance).  */
>     362   new_pid = CLONE (__spawni_child, STACK (stack, stack_size),
> stack_size,
>     363                    CLONE_VM | CLONE_VFORK | SIGCHLD, &args);
>     364 
>     365   if (new_pid > 0)
>     366     {
>     367       ec = args.err;
>     368       assert (ec >= 0);
> 
> The assert can fire if the child process dies before the err member is
> written.  Atomics should probably used to guard against compiler reordering
> of the store, too.

(In reply to Florian Weimer from comment #2)
> (In reply to Adhemerval Zanella from comment #1)
> > So the input err memory position won't be accessed concurrently on parent
> > because CLONE_VFORK (since kernel interface guarantee that calling process
> > is suspended) and all return path on the child functions will be either
> > default execution through args->exec/maybe_script_execute or through the
> > fail label.
> 
> But if the process is killed, this will not result in a vfork failure, will
> it?

Indeed killing the child in the middle of the its execution might trigger the assert.  Another issue I can see is process might be killed after line 271 (setting err to 0) and before actually issuing exec.  In this case it will posix_spawn won't trigger the assert and returns as the external process has ran correctly.

However I do not think setting the err atomically would help here, ideally we will need kernel help on to signaling abrupt child execution. One option would be to set all signals to SIG_IGN (although it won't help with SIGKILL or SIGSTOP).

What about removing the assert and setting all signal handler as SIG_IGN?
Comment 4 Florian Weimer 2017-10-12 08:57:23 UTC
(In reply to Adhemerval Zanella from comment #3)

> However I do not think setting the err atomically would help here, ideally
> we will need kernel help on to signaling abrupt child execution. One option
> would be to set all signals to SIG_IGN (although it won't help with SIGKILL
> or SIGSTOP).

I think atomic accesses are needed for their compiler barrier properties.

> What about removing the assert and setting all signal handler as SIG_IGN?

We can't do that because the signal handler disposition is inherited across execve.  I think it is sufficient to check the waitpid status in combination with the error code.  I don't think we need to report whether the subprocess was terminated before or after the execve (which can be successful or not).
Comment 5 Adhemerval Zanella 2017-10-12 13:19:59 UTC
(In reply to Florian Weimer from comment #4)
> (In reply to Adhemerval Zanella from comment #3)
> 
> > However I do not think setting the err atomically would help here, ideally
> > we will need kernel help on to signaling abrupt child execution. One option
> > would be to set all signals to SIG_IGN (although it won't help with SIGKILL
> > or SIGSTOP).
> 
> I think atomic accesses are needed for their compiler barrier properties.

But exactly what you are trying to prevent in spawni code? Essentially the code runs sequentially and I think it would not be legal the compiler evaluate ec without checking for new_pid result.

> 
> > What about removing the assert and setting all signal handler as SIG_IGN?
> 
> We can't do that because the signal handler disposition is inherited across
> execve.  I think it is sufficient to check the waitpid status in combination
> with the error code.  I don't think we need to report whether the subprocess
> was terminated before or after the execve (which can be successful or not).

Right, we need to respect POSIX_SPAWN_SETSIGDEF.  What about:

diff --git a/sysdeps/unix/sysv/linux/spawni.c b/sysdeps/unix/sysv/linux/spawni.c
index dea1650..f02ac19 100644
--- a/sysdeps/unix/sysv/linux/spawni.c
+++ b/sysdeps/unix/sysv/linux/spawni.c
@@ -365,9 +365,15 @@ __spawnix (pid_t * pid, const char *file,
   if (new_pid > 0)
     {
       ec = args.err;
-      assert (ec >= 0);
       if (ec != 0)
-         __waitpid (new_pid, NULL, 0);
+       {
+         /* It handles the unlikely case where the auxiliary vfork process
+            is killed before calling _exit or execve.  */
+         int status;
+         __waitpid (new_pid, &status, 0);
+         if (WIFSIGNALED (status))
+           ec = -1;
+       }
     }
   else
     ec = -new_pid;

It should handle a installed signal handler not erased by POSIX_SPAWN_SETSIGDEF (assuming it won't terminate the process) and the case of of a termination signal before _exit or execve.
Comment 6 Andreas Schwab 2017-10-12 13:33:09 UTC
You need to use a positive error number, not -1.
Comment 7 Adhemerval Zanella 2017-10-12 13:40:29 UTC
(In reply to Andreas Schwab from comment #6)
> You need to use a positive error number, not -1.

Indeed and still it won't help the case where the process is killed after setting args.err to 0 and I do not see how to correctly report it to user.  Even if we check waitpid unconditionally (with WNOHANG) for ec != 0 it might a case where the required program runs and it is killed just before the actual waitpid test. Any idea?
Comment 8 Adhemerval Zanella 2017-10-12 13:48:17 UTC
(In reply to Adhemerval Zanella from comment #7)
> (In reply to Andreas Schwab from comment #6)
> > You need to use a positive error number, not -1.
> 
> Indeed and still it won't help the case where the process is killed after
> setting args.err to 0 and I do not see how to correctly report it to user. 
> Even if we check waitpid unconditionally (with WNOHANG) for ec != 0 it might
> a case where the required program runs and it is killed just before the
> actual waitpid test. Any idea?

One option I have in mind is create the child with CLONE_CHILD_SETTID and call set_tid_address with args.err as input (I am using set_tid_address to avoid have to adjust each clone signature to take ctid in correct order).
Comment 9 Andreas Schwab 2017-10-12 14:20:45 UTC
Killing the child between args.err = 0 and exec is not a problem, that can be considered a successful spawn, and the caller will eventually call wait and note the signal (if he is interested in it).  In fact, I think any signal that happens between fork and exec should be relayed to the caller to figure out, as if spawn was successful.
Comment 10 Adhemerval Zanella 2017-10-12 14:30:45 UTC
(In reply to Andreas Schwab from comment #9)
> Killing the child between args.err = 0 and exec is not a problem, that can
> be considered a successful spawn, and the caller will eventually call wait
> and note the signal (if he is interested in it).  In fact, I think any
> signal that happens between fork and exec should be relayed to the caller to
> figure out, as if spawn was successful.

This seems reasonable, in this case I think we should set ec equal 0 for WIFSIGNALED in my suggestion at comment #5.
Comment 11 Adhemerval Zanella 2017-10-20 18:29:07 UTC
Fixed by fe05e1cb6d64dba6172249c79526f1e9af8f2bfd.