This is the mail archive of the libc-alpha@sourceware.org mailing list for the glibc project.
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]
Other format:	[Raw text]
Re: [PATCH v2 03/21] nptl: Fix Race conditions in pthread cancellation (BZ#12683)

From: Zack Weinberg <zackw at panix dot com>
To: libc-alpha at sourceware dot org
Cc: adhemerval dot zanella at linaro dot org
Date: Sun, 6 May 2018 22:49:07 -0400
Subject: Re: [PATCH v2 03/21] nptl: Fix Race conditions in pthread cancellation (BZ#12683)
References: <20180507024909.5598-1-zackw@panix.com>
References: <1519679016-12241-4-git-send-email-adhemerval.zanella@linaro.org>
On 26 Feb 2018, Adhemerval Zanella wrote:
> This patches fixes some race conditions in NPTL cancellation code by
> redefining how cancellable syscalls are defined and handled.  Current
> approach is to enable asynchronous cancellation prior to making the syscall
> and restore the previous cancellation type once the syscall returns.
>
> As decribed in BZ#12683, this approach shows 2 important problems:
>
>   1. Cancellation can act after the syscall has returned from kernel, but
>      before userspace saves the return value.  It might result in a resource
>      leak if the syscall allocated a resource or a side effect (partial
>      read/write), and there is no way to program handle it with cancellation
>      handlers.
>
>   2. If a signal is handled while the thread is blocked at a cancellable
>      syscall, the entire signal handler runs with asynchronous cancellation
>      enabled.  This can lead to issues if the signal handler call functions
>      which are async-signal-safe but not async-cancel-safe.
>
> For cancellation to work correctly, there are 5 points at which the
> cancellation signal could arrive:
>
>   1. Before the final "testcancel" and before the syscall is made.
>   2. Between the "testcancel" and the syscall.
>   3. While the syscall is blocked and no side effects have yet taken place.
>   4. While the syscall is blocked but with some side effects already having
>      taken place (e.g. a partial read or write).
>   5. After the syscall has returned.
>
> And GLIBC wants to act on cancellation in cases 1, 2, and 3 but not in case
> 4 or 5.  The proposed solution follows:
>
>   * Handling case 1 is trivial: do a conditional branch based on whether the
>     thread has received a cancellation request;
>   * Case 2 can be caught by the signal handler determining that the saved
>     program counter (from the ucontext_t) is in some address range beginning
>     just before the "testcancel" and ending with the syscall instruction.
>   * In this case, except for certain syscalls that ALWAYS fail with EINTR
>     even for non-interrupting signals, the kernel will reset the program
>     counter to point at the syscall instruction during signal handling, so
>     that the syscall is restarted when the signal handler returns. So, from
>     the signal handler's standpoint, this looks the same as case 2, and thus
>     it's taken care of.
>   * In this case, the kernel cannot restart the syscall; when it's
>     interrupted by a signal, the kernel must cause the syscall to return
>     with whatever partial result it obtained (e.g. partial read or write).
>   * In this case, the saved program counter points just after the syscall
>     instruction, so the signal handler won't act on cancellation.
>     This one is equal to 4. since the program counter is past the syscall
>     instruction already.

To be 100% clear here, you are fixing important problem #1 by having
deferred cancellation _not_ trigger in cases 4 and 5, because the
signal handler observes that the PC is past the syscall instruction?

> Another case that needs handling is syscalls that fail with EINTR even
> when the signal handler is non-interrupting. In this case, the syscall
> wrapper code can just check the cancellation flag when the errno result
> is EINTR, and act on cancellation if it's set.

Which system calls are affected by this rule, and is it actually
correct to have cancellation trigger when they fail with EINTR?  Can
we be certain that the EINTR was due to the cancellation signal rather
than some other signal delivered first?  (At least for sigsuspend it
is possible to have two signals be delivered simultaneously, if they
were both blocked and pending, and both become unblocked due to the
sigsuspend.  The kernel pushes two signal stack frames.  It would
reduce the odds of this being a problem if the SIGCANCEL handler
masked all other signals while running, but I'm not sure that will
work with it jumping directly to __pthread_unwind under some
conditions, and also I don't think it _completely_ eliminates the
possibility; a second signal could be delivered right after sigreturn
unmasks it, without returning to the caller of the syscall stub first.)

>   1. Remove the enable_asynccancel/disable_asynccancel function usage in
>      syscall definition and instead make them call a common symbol that will
>      check if cancellation is enabled (__syscall_cancel at
>      nptl/libc-cancellation.c), call the arch-specific cancellable
>      entry-point (__syscall_cancel_arch) and cancel the thread when required.
>
>   2. Provide a arch-specific symbol that contains global markers. These
>      markers will be used in SIGCANCEL handler to check if the interruption
>      has been called in a valid syscall and if the syscalls has been
>      completed or not.
>      A default version is provided (sysdeps/unix/sysv/linux/syscall_cancel.c),
>      however the markers may not be set on correct expected places depeding
>      of how INTERNAL_SYSCALL_NCS is implemented by the underlying architecture.
>      In this case arch-specific implementation should be provided.
>
>   3. Rewrite SIGCANCEL asynchronous handler to check for both cancelling type
>      and if current IP from signal handler falls between the global markes
>      and act accordingly (sigcancel_handler at nptl/nptl-init.c).
>
>   4. Adjust nptl/pthread_cancel.c to send an signal instead of acting
>      directly. This avoid synchronization issues when updating the
>      cancellation status and also focus the logic on signal handler and
>      cancellation syscall code.
>
>   5. Adjust pthread code to replace CANCEL_ASYNC/CANCEL_RESET calls to
>      appropriated cancelable futex syscalls.
>
>   6. Adjust libc code to replace LIBC_CANCEL_ASYNC/LIBC_CANCEL_RESET to
>      appropriated cancelable syscalls.
>
>   7. Adjust 'lowlevellock-futex.h' arch-specific implementations to provide
>      cancelable futex calls (used in libpthread code).

The overall design seems sound to me.

> This patch adds the proposed changes to NPTL.  The code leaves all the ports
> broken without further patches in the list.

Can we find an ordering of the patches in which the tree is never
broken at an intermediate stage?  Perhaps you could introduce the generic
__syscall_cancel(_arch) first, but not change any of the existing
cancellation logic until after all of the port-specific code was in
place?

Below, all the changes which I erased and didn't comment on seem OK to
me.  I am generally pleased to see many system call wrappers get
simpler in the process of fixing this bug.

> --- a/io/creat.c
> +++ b/io/creat.c
...
> -
> -/* __open handles cancellation.  */
> -LIBC_CANCEL_HANDLED ();

It might be clearer to split patch 2 into two patches, one to remove
tst-cancel-wrappers.sh and one to make all the other testing changes,
and then shift all of the removals of LIBC_CANCEL_HANDLED into the
same patch that removes tst-cancel-wrappers.sh.

> --- a/nptl/descr.h
> +++ b/nptl/descr.h
...
> +  /* Bit set if threads is canceled.  */

Grammar: "Bit set if thread is canceled."

(For 100% proper English, it would be "This bit is set if this thread
has been cancelled", but all of the other comments are clipped as
well, so the above is fine.)

> --- a/nptl/libc-cancellation.c
> +++ b/nptl/libc-cancellation.c
...
> +#include <setjmp.h>
> +#include <stdlib.h>

Why are these additional #includes necessary?

...
> +  volatile struct pthread *pd = (volatile struct pthread *) self;

Why is it necessary for this pointer to be volatile?  It appears that
we only ever access pd->cancelhandling; which loads need to be
forced to go to memory, and should they also be atomic?

> +  /* If cancellation is not enabled, call the syscall directly.  */
> +  if (pd->cancelhandling & CANCELSTATE_BITMASK)

It is unfortunate that the one-bit flag that means "cancellation is
disabled" is named CANCELSTATE_BITMASK instead of, I dunno,
CANCEL_DISABLED_BITMASK maybe.  You didn't introduce that, but please
consider a follow-up patch that renames CANCELSTATE_BIT(MASK) and also
CANCELTYPE_BIT(MASK) [the latter to "CANCEL_ASYNC_BIT(MASK)" perhaps].

> +  if ((result == -EINTR)
> +      && (pd->cancelhandling & CANCELED_BITMASK)
> +      && !(pd->cancelhandling & CANCELSTATE_BITMASK))
> +    __syscall_do_cancel ();

I thought you said this treatment was only applied to system calls
that _always_ fail with EINTR, even when interrupted by SA_RESTART
signals.  This is doing it for system calls interrupted by
non-SA_RESTART signals as well.  You also change SIGCANCEL to be a
SA_RESTART signal in this patch, but it still could happen that we
have SIGCANCEL and some other non-SA_RESTART signal arrive
simultaneously enough that they interrupt the same system call, and
then it fails with EINTR and everything is confused.

I think you need to check over the entire design with "what if
SIGCANCEL and some other signal are delivered as nested interruptions
to the same system call" in mind, in fact.  The dangerous case that I
see is when the _other_ signal causes the system call to be
interrupted after producing some side effects.  Can the SIGCANCEL
handler know that this has happened or is about to happen?

> --- a/nptl/nptl-init.c
> +++ b/nptl/nptl-init.c
>  sigcancel_handler (int sig, siginfo_t *si, void *ctx)
>  {
> +  INTERNAL_SYSCALL_DECL (err);
> +  pid_t pid = INTERNAL_SYSCALL_CALL (getpid, err);
> +
>    /* Safety check.  It would be possible to call this function for
>       other signals and send a signal from another process.  This is not
>       correct and might even be a security problem.  Try to catch as
>       many incorrect invocations as possible.  */
>    if (sig != SIGCANCEL
> -      || si->si_pid != __getpid()
> +      || si->si_pid != pid
>        || si->si_code != SI_TKILL)

Why is this change necessary?

> +  volatile struct pthread *pd = (volatile struct pthread *) self;

Same question as above: why does this need to be volatile, does it
actually do the job, should we instead be using atomics?

> +  /* Add SIGCANCEL on ignored sigmask to avoid the handler to be called
> +     again.  */
> +  sigset_t *set = UCONTEXT_SIGMASK (ctx);
> +  __sigaddset (set, SIGCANCEL);

Screwing with the signal mask that will be restored by sigreturn seems
unsafe.  Could we instead return early if CANCELED_BITMASK and/or
EXITING_BITMASK has already been set?

> +  /* Check if asynchronous cancellation mode is set and if interrupted
                                                       ^^^
This 'and' should be 'or'.

> +      THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
> +      THREAD_SETMEM (self, result, PTHREAD_CANCELED);

I think these ought to be done inside __do_cancel, since several other
places may call __do_cancel.

> +      INTERNAL_SYSCALL_CALL (rt_sigprocmask, err, SIG_SETMASK, set, NULL,
> +                          _NSIG / 8);

See above in re screwing with the signal mask.

> +      __do_cancel ();
> +      return;
>      }
>  }

This 'return' should be unnecessary on two counts: we're about to fall
off the end of the function anyway, and __do_cancel ends by calling
__pthread_unwind which does not return.

--- a/nptl/pthreadP.h
+++ b/nptl/pthreadP.h
...
> -  /* Make sure we get no more cancellations.  */
> -  THREAD_ATOMIC_BIT_SET (self, cancelhandling, EXITING_BIT);
> +  /* Make sure we get no more cancellations by clearing the cancel
> +     state.  */
> +  int oldval = THREAD_GETMEM (self, cancelhandling);
> +  while (1)
> +    {
> +      int newval = (oldval | CANCELSTATE_BITMASK);
> +      newval &= ~(CANCELTYPE_BITMASK);

Disabled cancellation state is specified to take precedence over
asynchronous cancellation type, so it should be enough to atomically
set CANCELSTATE_BIT, at which point the CAS loop would be unnecessary.

> --- a/nptl/pthread_cancel.c
> +++ b/nptl/pthread_cancel.c
...
> +  /* Avoid signaling when thread attempts cancel itself (pthread_kill
> +     is expensive).  */
> +  if (pd == THREAD_SELF && !(pd->cancelhandling & CANCELTYPE_BITMASK))
> +    return 0;

It is not obvious why this is correct.  The answer is that
pthread_cancel is not itself a cancellation point, so, when
cancellation is deferred, a self-cancel is _not_ supposed to take
effect immediately; the signal handler would have no effect, and the
next cancellation point will check the flag anyway.  I suggest instead

    if (pd == THREAD_SELF)
      {
         if (pd->cancelhandling & CANCELTYPE_BITMASK)
           __do_cancel ();
         /* pthread_cancel is not itself a cancellation point;
            the next cancellation point will check the flag anyway,
            so there is no need to send the cancellation signal.  */
        return 0;
     }
     return __pthread_kill (th, SIGCANCEL);

> --- a/nptl/pthread_create.c
> +++ b/nptl/pthread_create.c
...
>    /* If the parent was running cancellation handlers while creating
>       the thread the new thread inherited the signal mask.  Reset the
>       cancellation signal mask.  */

If the cancellation handler exited early when EXITING_BIT was set, and
it didn't mess with the signal mask, we wouldn't need to do this, either.

> -       int oldtype = CANCEL_ASYNC ();
> +       int ct;
> +       __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);

I don't understand the logic in this part of the code at all so I am
going to trust you that this is the correct change.  It seems like
it's using SIGCANCEL for something other than cancellation, which
seems unwise and maybe should be changed in a follow-up.

> --- a/nptl/pthread_join_common.c
> +++ b/nptl/pthread_join_common.c
...
> -      int oldtype = CANCEL_ASYNC ();
> +      int ct;
> +      __pthread_setcanceltype (PTHREAD_CANCEL_ASYNCHRONOUS, &ct);
>
>        if (abstime != NULL)
>       result = lll_timedwait_tid (pd->tid, abstime);
>        else
>       lll_wait_tid (pd->tid);
>
> -      CANCEL_RESET (oldtype);
> +      __pthread_setcanceltype (ct, NULL);

Maybe we should have lll_(timed)wait_tid_cancel rather than this?

> --- a/sysdeps/unix/sysdep.h
> +++ b/sysdeps/unix/sysdep.h
...
> +/* The loader does not need to handle thread cancellation, use direct
> +   syscall instead.  */

A stronger assertion would be better:

/* The loader should never make cancellable system calls.  Remap them
   to direct system calls.  */

Also, is this hack still necessary after I went through and made the
loader explicitly use _nocancel operations?

> --- a/sysdeps/unix/sysv/linux/pthread_kill.c
> +++ b/sysdeps/unix/sysv/linux/pthread_kill.c
...
> -  /* Disallow sending the signal we use for cancellation, timers,
> -     for the setxid implementation.  */
> -  if (signo == SIGCANCEL || signo == SIGTIMER || signo == SIGSETXID)
> +  /* Disallow sending the signal we use for setxid implementation.  */
> +  if (signo == SIGSETXID)
>      return EINVAL;

Now applications can call pthread_kill(thr, SIGCANCEL) themselves,
can't they?  That seems unsafe.  Also this has since been changed to
use __is_internal_signal, I think.  We need an __internal_pthread_kill
or something.

> --- a/sysdeps/unix/sysv/linux/socketcall.h
> +++ b/sysdeps/unix/sysv/linux/socketcall.h
...
> +#define __SOCKETCALL_CANCEL1(__name, __a1) \
> +  SYSCALL_CANCEL_NCS (socketcall, __name, \
> +     ((long int [1]) { (long int) __a1 }))

Blech, which architectures still require us to use socketcall?

> --- /dev/null
> +++ b/sysdeps/unix/sysv/linux/syscall_cancel.c
...
> +#define ADD_LABEL(__label)		\
> +  asm volatile (			\
> +    ".global " __label "\t\n"		\
> +    ".type " __label ",\%function\t\n"       \
> +    __label ":\n");

This makes me extremely nervous.  Specifically, I do not trust the
compiler to not move these around, even though they're volatile.  I
would actually prefer an .S file for every architecture.

Do we have a concrete reason to believe they really will always be
where they're supposed to be?  "It compiled fine with the compiler I
have right now" is not enough, I'm afraid.

> --- a/sysdeps/unix/sysv/linux/sysdep.h
> +++ b/sysdeps/unix/sysv/linux/sysdep.h
...
> +/* Check error from cancellable syscall and set errno accordingly.
> +   Linux uses a negative return value to indicate syscall errors
> +   and since version 2.1 the return value of a system call might be
> +   negative even if the call succeeded (e.g., the `lseek' system call
> +   might return a large offset).
> +   Current contract is kernel make sure the no syscall returns a value
> +   in -1 .. -4095 as a valid result so we can savely test with -4095.  */
> +#define SYSCALL_CANCEL_RET(__ret)		\
> +  ({						\
> +    if (__ret > -4096UL)			\
> +      {						\
> +	__set_errno (-__ret);			\
> +	__ret = -1;				\
> +      }						\
> +    __ret;					\
> +   })

Don't we already have this somewhere under another name?

zw
Follow-Ups:
- Re: [PATCH v2 03/21] nptl: Fix Race conditions in pthread cancellation (BZ#12683)
  - From: Adhemerval Zanella
References:
- Re: [PATCH v2 01/21] powerpc: Create stackframe information on syscall
  - From: Zack Weinberg
Index Nav:	[Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:	[Date Prev] [Date Next]	[Thread Prev] [Thread Next]