posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough

Rain glibc@sunshowers.io
Sun Aug 14 03:38:41 GMT 2022


On Sat, Aug 13, 2022, at 20:30, Rain wrote:
> Hi there --
>
> I've been working on a CLI tool (in Rust) that spawns lots of processes 
> with posix_spawn. Specifically, I've been observing its behavior when 
> Ctrl-Z is pressed in a terminal, and the process group receives a 
> SIGTSTP signal. I'm seeing an issue where if the signal is received 
> early enough during the posix_spawn process, the parent can be stuck in 
> the middle of the clone3() syscall, an uninterruptible sleep status.
>
> Here are some backtraces, observed with glibc 2.35 and Linux kernel 
> 5.18.10-76051810-generic on Ubuntu 22.04 (x86_64). I checked glibc 
> master and I'm not seeing any code changes in this area, so I presume 
> this issue still exists.
>
> In this case, during setup, posix_spawnattr_setsigmask is called with 
> an empty signal set. However, based on reading the source code. I don't 
> think that's relevant.
>
> --- parent process ---
>
> (gdb) bt
> #0  clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:62
> #1  0x00007f12a0a37a51 in __GI___clone_internal 
> (cl_args=cl_args@entry=0x7f129a5ed9e0, func=func@entry=0x7f12a0a24300 
> <__spawni_child>, arg=arg@entry=0x7f129a5eda40)
>     at ../sysdeps/unix/sysv/linux/clone-internal.c:54
> #2  0x00007f12a0a241f3 in __spawnix (pid=0x7f129a5edd20, 
> file=0x7f123405d030 
> "/home/rain/dev/tokio/target/debug/deps/sync_mutex-22a40a7c6051156b", 
> file_actions=0x7f129a5edd60, 
>     attrp=<optimized out>, argv=<optimized out>, envp=0x7f123403f2e0, 
> xflags=1, exec=0x7f12a09fcdd0 <__execvpex>) at 
> ../sysdeps/unix/sysv/linux/spawni.c:388
> #3  0x00007f12a0a2490b in __spawni (pid=<optimized out>, 
> file=<optimized out>, acts=<optimized out>, attrp=<optimized out>, 
> argv=<optimized out>, envp=<optimized out>, xflags=1)
>     at ../sysdeps/unix/sysv/linux/spawni.c:436
> #4  0x00007f12a0a2403f in __posix_spawnp (pid=<optimized out>, 
> file=<optimized out>, file_actions=<optimized out>, attrp=<optimized 
> out>, argv=<optimized out>, envp=<optimized out>)
>     at ./posix/spawnp.c:30
> #5  0x000056199dee0811 in 
> std::sys::unix::process::process_common::Command::posix_spawn () at 
> library/std/src/sys/unix/process/process_unix.rs:544
> #6  std::sys::unix::process::process_common::Command::spawn () at 
> library/std/src/sys/unix/process/process_unix.rs:57
> #7  0x000056199ded68dc in std::process::Command::spawn () at 
> library/std/src/process.rs:881
>
> --- child process ---
>
> (gdb) bt
> #0  __GI___pthread_sigmask (how=how@entry=2, newmask=<optimized out>, 
> oldmask=oldmask@entry=0x0) at ./nptl/pthread_sigmask.c:43
> #1  0x00007faaf8edd71d in __GI___sigprocmask (how=how@entry=2, 
> set=<optimized out>, oset=oset@entry=0x0) at 
> ../sysdeps/unix/sysv/linux/sigprocmask.c:25
> #2  0x00007faaf8fae4d8 in __spawni_child (arguments=<optimized out>) at 
> ../sysdeps/unix/sysv/linux/spawni.c:287
> #3  0x00007faaf8fc1a00 in clone3 () at 
> ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
>
> ---
>
> Based on these backtraces and reading the source code, here's what I 
> believe is happening:
>
> 1. The parent calls __posix_spawnp, which in turn calls __spawni and 
> __spawnix.
> 2. The parent calls clone3 and enters uninterruptible sleep.
> 3. The child enters __spawni_child and blocks all incoming signals.
> ---> 4. At this point the child receives a SIGTSTP signal. <---
> 5. The child unblocks signals by calling sigprocmask/pthread_sigmask.
> 6. At this point the SIGTSTP is delivered to the child.
> 7. However, the clone hasn't exited in the parent and so it remains 
> stuck in the clone3 syscall until the child receives a SIGCONT.
>
> I'm not sure what a reasonable way to handle this would be on the part 
> of my CLI tool. The tool currently just gets stuck in uninterruptible 
> sleep, resulting in a bad user experience.
>
> Here are solutions I've thought about that don't seem to work (please 
> correct me if I'm wrong!)
> 1. Setting the signal mask to include SIGTSTP. I do want to be able to 
> send the child SIGTSTP after the clone(), and in my case the child is a 
> third-party process so I can't depend on it to reset the signal mask.
> 2. Spawning a stub process that execves the real child. It seems like 
> the same issue exists when the main process calls the stub process, if 
> I'm understanding the code correctly, so this won't help.
>
> ... though now as I'm writing this email out, maybe one solution is:
>
> * my tool spawns a stub process with SIGTSTP masked.
> * the subprocess unmasks SIGTSTP (so it could receive the SIGTSTP here, 
> but at least it won't block the parent process), then execves the 
> third-party process.
>
> Is that the solution you would recommend?
>
> Thanks.
>
> --
>
> Rain
> (they/she)

I apologize for the lack of wrapped text -- I didn't realize that my
MUA (Fastmail) doesn't wrap plaintext emails. I believe the quoted text
in this response should be correctly wrapped.

-- 
Rain
(they/she)


More information about the Libc-help mailing list