[PATCH v4 02/21] nptl: Fix Race conditions in pthread cancellation [BZ#12683]
Adhemerval Zanella
adhemerval.zanella@linaro.org
Wed Apr 8 14:13:13 GMT 2020
On 07/04/2020 15:24, Zack Weinberg wrote:
> On Fri, Apr 3, 2020 at 4:32 PM Adhemerval Zanella via Libc-alpha
> <libc-alpha@sourceware.org> wrote:
>>
>> This patch is the initial fix for race conditions in NPTL cancellation
>> code by redefining how cancellable syscalls are defined and handled.
>> The current buggy approach is to enable asynchronous cancellation
>> before making the syscall and restore the previous cancellation
>> type once the syscall returns.
>
> I want to see this bug fixed. Unfortunately I don't know the guts of
> NPTL well enough to review your patches completely, but here are a few
> things I noticed:
Thanks.
>
>> As a side note regarding SIGCANCEL and SIGTIMER being the the same,
>> it should not impact timer_create functionality. It arranges for
>> SIGCANCEL/SIGTIMER to be sent to the internal helper thread, which
>> in turn check if the si.si_code is SI_TIMER and call pthread_exit
>> otherwise (sysdeps/unix/sysv/linux/timer_routines.c:129).
>
> Can we be absolutely certain that SIGCANCEL/SIGTIMER will always be
> sent to a specific thread and not to a process?
With this patch pthread_cancel calls __pthread_kill_internal and
in turn it will use tgkill with process's pid and struct pthread tid.
And Linux timer_create does use the helper thread tid as the target one
(line 148).
>
>> + /* Add SIGCANCEL on ignored sigmask to avoid the handler to be called
>> + again. */
>> + ucontext_block_sigcancel (ctx);
>> +
>> + /* Check if asynchronous cancellation mode is set or if interrupted
>> + instruction pointer falls within the cancellable syscall bridge. For
>> + interruptable syscalls that might generate external side-effects (partial
>> + reads or writes, for instance), the kernel will set the IP to after
>> + '__syscall_cancel_arch_end', thus disabling the cancellation and allowing
>> + the process to handle such conditions. */
>> + if (self->canceltype == PTHREAD_CANCEL_ASYNCHRONOUS
>> + || cancellation_pc_check (ctx))
>> + __do_cancel (PTHREAD_CANCELED);
>
> Shouldn't this check happen _before_ we block further SIGCANCELs?
> If cancellation_pc_check fails, because the signal was delivered on
> exit from a system call that has had side effects, don't we need to
> be able to receive future SIGCANCELs in order for the next cancellation
> point to trigger?
For this case the cancellation will happen on next cancellation entrypoint
since pthread_cancel has already set the CANCELED_BIT on the target
struct pthread cancelhandling member and it will be checked on every
cancellation entrypoint prior issuing the syscall.
>
>> /* Install the cancellation signal handler. If for some reason we
>> cannot install the handler we do not abort. Maybe we should, but
>> it is only asynchronous cancellation which is affected. */
>
> 1) I think the third sentence of this comment has always been wrong.
> 2) Perhaps, if we cannot install a handler for SIGCANCEL, we should set
> a global flag which causes all calls to pthread_cancel to fail?
>
The cases I see it might fail are if kernel or some filter (seccomp, bpf,
etc.) explicit disables it. And I am not sure if is up to glibc to actually
handle it, since it would require to:
1. Set a global flag stating that *asynchronous* thread cancellation is
not supported at runtime, as you noted.
2. The pthread_setcanceltype should the check this flag and fail for
PTHREAD_CANCEL_ASYNCHRONOUS.
3. We will need to adjust the code on thread start (START_THREAD_DEFN)
which set PTHREAD_CANCEL_ASYNCHRONOUS (and as Florian noted, this
code might be unnecessary).
4. We will also need to adjust on how to handle
pthread_cleanup_push_defer_np since __pthread_register_cancel_defer
also set and reset cancellation type to PTHREAD_CANCEL_ASYNCHRONOUS.
So I am not sure which would be better:
1. Do nothing (currently).
2. Abort in case of an handler installation error.
3. Disable asynchronous cancellation at runtime.
In any case I think we should discuss in in another thread.
>> + /* Install the handle to change the threads' uid/gid. */
>
> Typo: handle -> handler (it was wrong before, but you may as well fix it)
Ack.
>
>> + struct sigaction sa;
>> + __sigemptyset (&sa.sa_mask);
>> + sa.sa_sigaction = sighandler_setxid;
>> + sa.sa_flags = SA_SIGINFO | SA_RESTART;
>> + __libc_sigaction (SIGSETXID, &sa, NULL);
>
> Unrelated preexisting bug, but I think we probably _should_ crash the
> whole process if the SIGSETXID handler cannot be installed,
> particularly when __libc_enable_secure is true.
I tend to agree, but is there a real scenario where sigaction syscall
might fail here?
>
>> + /* Avoid signaling when thread attempts cancel itself (pthread_kill
>> + is expensive). */
>> + if (pd == THREAD_SELF)
>> + {
>> + if (pd->cancelstate == PTHREAD_CANCEL_ENABLE
>> + && pd->canceltype == PTHREAD_CANCEL_ASYNCHRONOUS)
>> + __pthread_exit (PTHREAD_CANCELED);
>> + return 0;
>
> This works because __pthread_exit is actually a tiny wrapper around
> __do_cancel, but I think the logic would be easier to understand,
> here, if it called __do_cancel.
The idea is eventually just remove __do_cancel and use __pthread_exit
instead internally. I didn't do it on this patch because it would
require first to move __pthread_exit to libc.so.
More information about the Libc-alpha
mailing list