This is the mail archive of the
mailing list for the glibc project.
Re: [PATCH 2/4] nptl: Handle EPIPE on tst-cancel2
- From: Rich Felker <dalias at libc dot org>
- To: Florian Weimer <fweimer at redhat dot com>
- Cc: Adhemerval Zanella <adhemerval dot zanella at linaro dot org>, libc-alpha at sourceware dot org
- Date: Mon, 2 Sep 2019 19:26:23 -0400
- Subject: Re: [PATCH 2/4] nptl: Handle EPIPE on tst-cancel2
- References: <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <firstname.lastname@example.org> <email@example.com> <20190830135103.GJ9017@brightrain.aerifal.cx> <firstname.lastname@example.org>
On Mon, Sep 02, 2019 at 02:37:17PM +0200, Florian Weimer wrote:
> * Rich Felker:
> > On Fri, Aug 23, 2019 at 02:47:39PM +0200, Florian Weimer wrote:
> >> I looked at this some more. It seems that in theory, we could also see
> >> EPIPE in the old implementation or at least observe execution of a
> >> signal handler, and the asynchronous cancel hides that.
> >> In the new implementation, if there is a signal handler and a signal for
> >> it is delivered before SIGCANCEL (even if the signal was sent *after*
> >> pthread_cancel, from the same thread), I do not think there is any
> >> chance whatsoever that we can hide the behavior difference. After all,
> >> SIGCANCEL may not trigger an asynchronous cancellation from the signal
> >> handler.
> >> System calls that are cancellation points appear to fall into these
> >> categories:
> >> (A) Those that do not perform any resource allocation (write, read).
> >> (B) Those that perform resource allocation, but the allocation can be
> >> easily reverted (openat, accept4).
> >> (C) Those that perform resource allocation, but the allocation is
> >> difficult to undo (recvmsg with descriptor passing).
> >> (D) close.
> >> For (A), I think POSIX allows sufficient wiggle room so that we can
> >> exhibit a partial side effect and still act on the cancellation (without
> >> reporting a partial write first to the application).
> > No. POSIX is very clear that cancellation is only permitted to be
> > acted on when the side effects are the same as EINTR, which can only
> > happen for read/write when no data has been transferred yet. If data
> > has already been transferred, the only option not ruled out by one or
> > more constraints is to return successfully with a short read or write;
> > pending cancellation will then be handled on a subsequent call to a
> > cancellation point. (If you're doing a retry loop until the total
> > length n is transferred, then in the next iteration of the loop.)
> This seems to be a backwards interpretation to me. I think POSIX wants
> to describe what can happen during cancellation, and not say that any
> cancellation point can be skipped if some side effect happens that
> cannot happen before EINTR.
The exact text is:
"The side-effects of acting upon a cancellation request while
suspended during a call of a function are the same as the
side-effects that may be seen in a single-threaded program when a
call to a function is interrupted by a signal and the given
function returns [EINTR]."
Admittedly it took me a while to understand this back when I first
started investigating the problem in 2010, but based on both careful
reading of this text, and on reasoning about what conditions are
*absolutely necessary* to make the cancellation functionality usable
in any meaningful way (without race conditions), yes it means that if
you act on cancellation, the [only] side effects the cancelled
function can have are side effects that would be allowed if the
function had failed with EINTR.
In hindsight, the main obstacle to my understanding this was bias
based on how badly wrong all existing implementations I'd seen were,
and resistance to believing they were all so badly wrong. But they
Moreover, the specification gives the implementation liberty *not* to
act on cancellation in certain conditions (very end of XSH 2.9.5
Thread Cancellation, subheading Cancellation points):
"It is unspecified whether the cancellation request is acted upon
or whether the cancellation request remains pending and the thread
resumes normal execution if:
- The thread is suspended at a cancellation point and the event for
which it is waiting occurs
- A specified timeout expired
before the cancellation request is acted upon."
The way I read this is that, if you start processing a cancellation
request, but see that the timeout has already expired or that whatever
the function was blocked waiting for has already happened, you can
abort cancellation and leave it pending and return the result.
In an implementation where libc/libpthread cancellation and the kernel
are tightly coupled, this is really a choice -- for example the kernel
could check at the last point before success during SYS_read to see
that cancellation is pending, and *not actually transfer the data*,
but instead initiate cancellation with no side effects on the open
file. But on an implementation like Linux where libc/libpthread and
the cancellation implementation is completely decoupled from the
kernel and implemented via signals (which is a hack!), you don't
actually have this freedom. By the time you see the signal, it's
possible that the syscall *already succeeded* with (potentially
serious) side effects, and in that case you have to use the option
POSIX gives you here to defer cancellation until the next cancellation
point. (If POSIX did not give you this option, then it may not even be
possible to do a conforming implementation based on the signal hack).
> > Of course this case of non-conformance would be less severe than UAF
> > or resource leak errors in most cases, but it's still not good.
> Agreed. And I think not acting on the cancellation and returning
> success is also not conforming.
It is conforming, by the above-quoted passage.
> >> For (C), the complexity may not be worth it.
> >> For (A), (B), (C), we can act on the cancellation in the error case,
> >> after we observe the cancellation flag in the signal handler trampoline.
> >> (I think dropping the EINTR restriction from there achieves that.) If
> >> we do that, we do not need to change the test case.
> >> (D) is very special. Ideally, we would specify what happens with the
> >> descriptor if the close call is canceled. POSIX does not even specify
> >> what the state of file descriptor is after an EINTR error, so it doesn't
> >> say what happens with cancellation, either. Maybe we have to leave that
> >> undefined.
> > Failure to specify EINTR was an error in POSIX that's been fixed for
> > the next issue. Leaving it undefined is atrociously bad and results in
> > UAF type errors. Even if glibc doesn't want to apply the fix for
> > EINTR, cancellation of close absolutely needs to behave *as if* by the
> > newly-specified behavior for close with EINTR.
> Can we even make sure we act on the cancellation only if the descriptor
> has been closed?
If the descriptor has been closed, you *can't* act on cancellation. On
Linux, SYS_close *never* returns to userspace without the fd having
been closed. It can return EINTR, indicating that the fd has been
closed but the underlying open file description (if this was the last
reference to it) has not, but this EINTR is not POSIX conforming with
the new changes to POSIX, and it was always inconsistent with the
semantics implied by EINTR ("no nontrivial side effects have happened
yet"). See https://sourceware.org/bugzilla/show_bug.cgi?id=14627
In any case, this is easy to deal with for cancellation, even if
fixing the EINTR issue is controversial. If PC is before or at the
syscall instruction, nothing has happened yet (fd is still open) and
you can act on cancellation. If PC is after the syscall instruction,
you can't act on it.
This is actually the same as all other syscalls, in some sense, except
that for (at least a few) other syscalls you want to also act on
cancellation if EINTR is returned, because a few of them EINTR on
receipt of any signal, including SA_RESTART ones like SIGCANCEL.
Historically there were lots like this, but most of them were
conformance bugs and fixed. I think select and/or poll is still like
that intentionally, but I may be misremembering.