Bug 32565 - Ctrl-Z when process is doing posix_spawn makes the process hard to kill
Summary: Ctrl-Z when process is doing posix_spawn makes the process hard to kill
Status: UNCONFIRMED
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.40
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2025-01-16 22:14 UTC by Askar Safin
Modified: 2025-01-17 11:52 UTC (History)
2 users (show)

See Also:
Host:
Target:
Build:
Last reconfirmed:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Askar Safin 2025-01-16 22:14:41 UTC
Ctrl-Z when process is doing posix_spawn makes the process hard to kill.

If a process does posix_spawn+waitpid, then attempting to pause it using Ctrl-Z sometimes doesn't work and, worse, makes the process unkillable by usual Ctrl-Z or Ctrl-C.

Steps to reproduce.

Compile one of the following two programs.

===
// big.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <spawn.h>
#include <pthread.h>

const int thread_count = 16;

void *
start_routine (void *)
{
    for (;;)
        {
            char *args[] = {"/bin/true", NULL};
            char *env[] = {"HOME=/", NULL};
            pid_t pid;
            if (posix_spawn (&pid, "/bin/true", NULL, NULL, args, env) != 0)
                {
                    fprintf (stderr, "posix_spawn failed\n");
                    exit (1);
                }
            if (waitpid (pid, NULL, 0) != pid)
                {
                    fprintf (stderr, "waitpid failed\n");
                    exit (1);
                }
        }
}

int
main (void)
{
    for (int p = 0; p != thread_count; ++p)
        {
            pthread_t pt;
            if (pthread_create (&pt, NULL, start_routine, NULL) != 0)
                {
                    fprintf (stderr, "pthread_create failed\n");
                    exit (1);
                }
        }
    for (;;)
        {
            pause ();
        }
}
===

===
// little.c
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <spawn.h>

int
main (void)
{
    for (;;)
        {
            char *args[] = {"/bin/true", NULL};
            char *env[] = {"HOME=/", NULL};
            pid_t pid;
            if (posix_spawn (&pid, "/bin/true", NULL, NULL, args, env) != 0)
                {
                    fprintf (stderr, "posix_spawn failed\n");
                    exit (1);
                }
            if (waitpid (pid, NULL, 0) != pid)
                {
                    fprintf (stderr, "waitpid failed\n");
                    exit (1);
                }
        }
}
===

Then run the program and try to pause it using "Ctrl-Z".

Sometimes this works, i. e. the program is paused as expected.

But sometimes this doesn't work and, moreover, turns the program into very-hard-to-kill state.

When I say "doesn't work", I mean that pressing "Ctrl-Z" doesn't pause the program, i. e. the program seems to continue running, and any further presses of Ctrl-Z or Ctrl-C don't stop program, either. So you have to open another terminal and do "kill -9" (fortunately, this works).

I gave you two programs: big.c and little.c. Both reproduce this bug. little.c is easier to understand. But big.c is more likely to hit the bug. In other words, pthread_create is not necessary for reproducing, but it increases chances of reproducing.

In case of big.c you usually need 1-5 attempts to reproduce the bug (1 attempt is usually enough).

I reproduced this bug on Debian sid with Linux 6.12 and glibc 2.40.

Bug is reproducible both in Qemu and real hardware. (On Qemu the bug is slightly harder to reproduce.)

The bug originally was found by Rain, when they developed their nextest test system for Rust. They had to add big workaround to nextest because of this bug. Previous discussions:

- https://lobste.rs/s/bnjeid/why_nextest_is_process_per_test#c_befw7u (this is a tree of comments)
- https://nexte.st/docs/design/architecture/signal-handling/#double-spawning-processes
- https://sourceware.org/pipermail/libc-help/2022-August/006263.html

Output of some commands on my test system:

# uname -a
Linux qemu-f0e28af36751 6.12.9-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.12.9-1 (2025-01-10) x86_64 GNU/Linux

# dpkg -l linux-image-amd64
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name              Version      Architecture Description
+++-=================-============-============-===================================
ii  linux-image-amd64 6.12.9-1     amd64        Linux for 64-bit PCs (meta-package)

# dpkg -l libc6
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name           Version      Architecture Description
+++-==============-============-============-=================================
ii  libc6:amd64    2.40-5       amd64        GNU C Library: Shared libraries

# /lib/x86_64-linux-gnu/libc.so.6
GNU C Library (Debian GLIBC 2.40-5) stable release version 2.40.
Copyright (C) 2024 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 14.2.0.
libc ABIs: UNIQUE IFUNC ABSOLUTE
Minimum supported kernel: 3.2.0
For bug reporting instructions, please see:
<http://www.debian.org/Bugs/>.
Comment 1 Askar Safin 2025-01-16 22:28:02 UTC
The bug is not reproducible if I do fork+execve+waitpid instead of posix_spawn+waitpid
Comment 2 Askar Safin 2025-01-16 22:35:53 UTC
The bug is reproducible on musl, too. And (similarly to glibc) it doesn't reproduce on musl, if I do fork instead of posix_spawn
Comment 3 Andreas Schwab 2025-01-16 23:09:29 UTC
This looks more like a kernel bug, it also happens with vfork.
Comment 4 Florian Weimer 2025-01-17 11:52:56 UTC
Previous discussion:

posix_spawn: parent can get stuck in uninterruptible sleep if child receives SIGTSTP early enough
<https://inbox.sourceware.org/libc-help/2921668c-773e-465d-9480-0abb6f979bf9@www.fastmail.com/>