Pthread Cancellation Race Condition Problems

The current approach to implementing pthread cancellation points is to enable asynchronous cancellation prior to making the syscall, and restore the previous cancellation type once the syscall returns. [NOTE: check if this fulfills POSIX requirements or not] And as described in bz#12683 bug report, current implementation suffers from some issues:

/* bz12683-test1.c: demonstration of file description leak.
   Build with: gcc -Wall bz12683-test1.c -o bz12683-test1 -lpthread  */

#include <pthread.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <stdlib.h>

void *
writeopener (void *arg)
  int fd;
  for (;;)
      fd = open (arg, O_WRONLY);
      close (fd);

void *
leaker (void *arg)
  int fd = open (arg, O_RDONLY);
  pthread_setcancelstate (PTHREAD_CANCEL_DISABLE, 0);
  close (fd);
  return 0;

#define ITER_COUNT 10000

main ()
  pthread_t td, bg;
  struct stat st;
  char tmp[] = "/tmp/cancel_race_XXXXXX";
  int i, leaks = 0;

  (void)mktemp (tmp);
  mkfifo (tmp, 0600);
  srand (1);
  pthread_create (&bg, 0, writeopener, tmp);
  for (i = 0; i < ITER_COUNT; i++)
      pthread_create (&td, 0, leaker, tmp);
      nanosleep ((&(struct timespec)
                  .tv_nsec = rand () % 100000}), 0);
      pthread_cancel (td);
      pthread_join (td, 0);
  unlink (tmp);
  for (i = 3; i < 1024; i++)
      if (!fstat (i, &st))
        leaks++, printf ("leaked fd %d\n", i);
  return !!leaks;

Running on current GLIBC it shows a result like below. The total number of leaked file descriptors can vary since it is a timing issue with randomness:

$ ./bz12683-test1 leaked fd 3 leaked fd 4 leaked fd 5 leaked fd 6 leaked fd 7 leaked fd 8 leaked fd 9 leaked fd 10 leaked fd 11 ...

/* bz12683-test2.c : demonstration of signal handler being executed in
                     asynchronous cancellation enabled.

   build with: gcc -Wall bz12683-test2.c -o bz12683-test2 -pthread  */

#define _XOPEN_SOURCE 700
#include <unistd.h>
#include <stdio.h>
#include <pthread.h>
#include <errno.h>
#include <signal.h>

volatile int x;

static void
handler (int sig)
  for (;;)

static void
bye (void *arg)
  printf ("x=%d\n", x);

static void *
start (void *foo)
  sigset_t set;
  sigemptyset (&set);
  sigaddset (&set, SIGUSR1);
  pthread_cleanup_push (bye, 0);
  pthread_sigmask (SIG_UNBLOCK, &set, 0);
  for (;;)
    sleep (1);
  pthread_cleanup_pop (0);
  return 0;

main ()
  pthread_t t;
  struct sigaction sa = { { 0 } } ;
  sigemptyset (&sa.sa_mask);
  sa.sa_handler = handler;
  sa.sa_flags = SA_RESTART;
  sigaction (SIGUSR1, &sa, 0);
  sigaddset (&sa.sa_mask, SIGUSR1);
  pthread_sigmask (SIG_BLOCK, &sa.sa_mask, 0);

  pthread_create (&t, 0, start, 0);
  nanosleep (&(struct timespec){.tv_nsec = 100000000}, 0);
  kill (getpid (), SIGUSR1);
  nanosleep (&(struct timespec){.tv_nsec = 100000000}, 0);
  pthread_cancel (t);
  pthread_join (t, 0);
  return 0;

Running on current GLIBC it yields, when it should hang:

$ ./bz12683-test2 x=38589543 $ ./bz12683-test2 x=39579798

These issues are not covered in any testcase currently.

Current implementation

Current pthread implementation for cancellation resolves around both arch-specific code on wrapper syscalls and also on arch independent NPTL code. The arch-specific code is provided in the sysdep-cancel.h headers through a set of macros to override from default architecture sysdep.h ones. In syscall auto-generation, if the function is a cancellable one (SYSCALL_CANCELLABLE defined to 1) sysdeps/unix/syscall-template.S will include sysdeps-cancel.h instead of sysdep.h

The basic strategy each arch should be provide in 'sysdep-cancel.h' headers is as following:

/* Code em pseudo-algorithm to simplify explanation.  */
PSEUDO(name, syscall, args)
  ENTRY (name)
      goto pseudo_cancel;
  DO_CALL (syscall_name)

  CALL (syscall_name)

Both CANCELLATION_ENABLE and CANCELLATION_DISABLE will be library internal symbols (pthread_enable_asynccancel, libc_enable_asynccancel, for instance) for each module that provides or uses cancelable syscalls. The SINGLE_THREAD_P checks if program is in single-thread mode (loader and default case) and avoids all the cancellation save/restore altogether.

The cancellation enable/disable symbols are implemented at:

Proposed solution

As described in comment #16 there are 5 points at which the cancellation signal could arrive:

  1. Before the final testcancel before the syscall is made.

  2. Between the testcancel and the syscall.

  3. While the syscall is blocked and no side effects have yet taken place.
  4. While the syscall is blocked but with some side effects already having taken place (e.g. a partial read or write).
  5. After the syscall has returned.

And basically GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case 4 or 5. And proposed solution, following Rich Felker suggestion is:

  1. Handling case 1 is trivial: do a conditional branch based on whether the thread has received a cancellation request;
  2. Case 2 can be caught by the signal handler determining that the saved program counter (from the ucontext_t) is in some address range beginning just before the "testcancel" and ending with the syscall instruction.
  3. In this case, except for certain syscalls that ALWAYS fail with EINTR even for non-interrupting signals, the kernel will reset the program counter to point at the syscall instruction during signal handling, so that the syscall is restarted when the signal handler returns. So, from the signal handler's standpoint, this looks the same as case 2, and thus it's taken care of.
  4. In this case, the kernel cannot restart the syscall; when it's interrupted by a signal, the kernel must cause the syscall to return with whatever partial result it obtained (e.g. partial read or write). In this case, the saved program counter points just after the syscall instruction, so the signal handler won't act on cancellation.
  5. This one is equal to 4. since the program counter is past the syscall instruction already.

Another case that needs handling is syscalls that fail with EINTR even when the signal handler is non-interrupting. In this case, the syscall wrapper code can just check the cancellation flag when the errno result is EINTR, and act on cancellation if it's set.

GLIBC adjustment to proposed solution

The proposed solution required an overhaul of both current GLIBC cancelable syscall code and pthread cancellation mechanism:

  1. Remove the enable_asynccancel/disable_asynccancel function usage in syscall definition and instead make them call a common symbol that will check if cancellation is enabled, call the arch-specific cancellable entry-point and cancel the thread when required.
  2. Provide a arch-specific symbol that contains global markers. These markers will be used in SIGCANCEL handler to check if the interruption has been called in a valid syscall and if the syscalls has been completed or not.
  3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type and if current IP from signal handler falls between the global markes and act accordingly.
  4. Adjust nptl/pthread_cancel.c to send an signal instead of acting directly. This avoid synchronization issues about updating the cancellation status and also focus the logic on signal handler and cancellation syscall code.
  5. Adjust pthread code to replace CANCEL_ASYNC/CANCEL_RESET calls to appropriated cancelable futex syscalls.
  6. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to appropriated cancelable syscalls.
  7. Rewrite sysdeps/unix/sysv/linux/not-cancel.h to calls inline syscalls for non-cancelable versions.
  8. Adjust 'lowlevellock.h' arch-specific implementations to provide cancelable futex calls (used in libpthread code).

None: Release/2.21/bz12683 (last edited 2014-09-10 21:31:35 by c9522916)