Bug 30163 - system() erroneously block SIGCHLD forever when called concurrently
Summary: system() erroneously block SIGCHLD forever when called concurrently
Alias: None
Product: glibc
Classification: Unclassified
Component: libc (show other bugs)
Version: 2.29
: P2 normal
Target Milestone: ---
Assignee: Not yet assigned to anyone
Depends on:
Reported: 2023-02-24 15:25 UTC by Adam Yi
Modified: 2023-03-09 02:51 UTC (History)
3 users (show)

See Also:
Last reconfirmed:

Reproduction code (303 bytes, text/x-csrc)
2023-02-24 15:25 UTC, Adam Yi
Proposed patch (1.73 KB, patch)
2023-02-24 16:29 UTC, Adam Yi
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Adam Yi 2023-02-24 15:25:19 UTC
Created attachment 14718 [details]
Reproduction code


Although POSIX does not require, glibc system implementation aims to be
thread and cancellation safe.

However, I found that commit 5fb7fc96350575c9adb1316833e48ca11553be49 introduced a concurrency bug. Specifically, in the following scenario with two threads in the same process:

1. Thread A calls system but hasn't returned yet
2. Thread B calls another system but returns

I observed that SIGCHLD would be blocked forever in thread B after its system() returns, even after the system() in thread A returns (though I believe it should unblock it after system() in thread B returns).


We always block SIGCHLD and store original signal mask in omask before calling posix_spawn:

// ...
 __sigemptyset (&sa.sa_mask);    
 // ...
 __sigaddset (&sa.sa_mask, SIGCHLD);                                                                                                          
 /* sigprocmask can not fail with SIG_BLOCK used with valid input                                                                             
    arguments.  */                                                                                                                            
 __sigprocmask (SIG_BLOCK, &sa.sa_mask, &omask); 
 // ...
 __posix_spawnattr_setsigmask (&spawn_attr, &omask);

However, when it comes to restoring to omask in parent process, the code previously looks like this:


  DO_LOCK ();
  if ((SUB_REF () == 0
       && (__sigaction (SIGINT, &intr, (struct sigaction *) NULL)
       | __sigaction (SIGQUIT, &quit, (struct sigaction *) NULL)) != 0)
      || __sigprocmask (SIG_SETMASK, &omask, (sigset_t *) NULL) != 0)
#ifndef _LIBC
      /* glibc cannot be used on systems without waitpid.  */
      if (errno == ENOSYS)
    __set_errno (save);
    status = -1;

The aforementioned buggy commit changed it to this:


if (SUB_REF () == 0)
    /* sigaction can not fail with SIGINT/SIGQUIT used with old
   disposition.  Same applies for sigprocmask.  */
    __sigaction (SIGINT, &intr, NULL);
    __sigaction (SIGQUIT, &quit, NULL);
    __sigprocmask (SIG_SETMASK, &omask, NULL);

Previously we would unconditionally revert mask back to omask but only call sigaction if it's the last one (since that's shared for the whole process); now we also only revert if it's the last ongoing system, despite that signal mask is per-thread.


I believe we should move the last `__sigprocmask` out of this if statement. I already have a patch ready and will send it over later today.

I've attached a simple C reproduction program to this bug. It doesn't print anything on releases/2.28/master (before introducing the bug) but prints "SIGCHLD is erroneously blocked" on releases/2.29/master and on latest master branch.
Comment 1 Adam Yi 2023-02-24 16:29:59 UTC
Created attachment 14720 [details]
Proposed patch
Comment 2 Adhemerval Zanella 2023-02-24 17:02:28 UTC
Can you send the patch to libc-alpha? All development is done there, along with patch reviews and discussions.
Comment 3 Adam Yi 2023-02-24 17:14:07 UTC
(In reply to Adhemerval Zanella from comment #2)
> Can you send the patch to libc-alpha? All development is done there, along
> with patch reviews and discussions.

Will do, thanks!
Comment 4 Adam Yi 2023-03-09 02:51:04 UTC
Fixed by 436a604b7dc741fc76b5a6704c6cd8bb178518e7